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Preface 


The Probabilistic Method is one of the most powerful and widely used tools applied 
in combinatorics. One of the major reasons for its rapid development is the impor- 
tant role of randomness in theoretical computer science and in statistical physics. 

The interplay between discrete mathematics and computer science suggests an 
algorithmic point of view in the study of the probabilistic method in combinatorics 
and this is the approach we tried to adopt in this book. The book thus includes a dis- 
cussion of algorithmic techniques together with a study of the classical method as 
well as the modern tools applied in it. The first part of the book contains a descrip- 
tion of the tools applied in probabilistic arguments, including the basic techniques 
that use expectation and variance, as well as the more recent applications of martin- 
gales and correlation inequalities. The second part includes a study of various topics 
in which probabilistic techniques have been successful. This part contains chapters 
on discrepancy and random graphs, as well as on several areas in theoretical com- 
puter science: circuit complexity, computational geometry, and derandomization of 
randomized algorithms. Scattered between the chapters are gems described under 
the heading The Probabilistic Lens. These are elegant proofs that are not necessarily 
related to the chapters after which they appear and can usually be read separately. 

The basic Probabilistic Method can be described as follows: In order to prove the 
existence of a combinatorial structure with certain properties, we construct an ap- 
propriate probability space and show that a randomly chosen element in this space 
has the desired properties with positive probability. This method was initiated by 
Paul Erdés, who contributed so much to its development over a fifty year period, 
that it seems appropriate to call it “The Erdés Method.” His contribution can be 
measured not only by his numerous deep results in the subject, but also by his many 
intriguing problems and conjectures that stimulated a big portion of the research in 
the area. 

It seems impossible to write an encyclopedic book on the Probabilistic Method; 
too many recent interesting results apply probabilistic arguments, and we do not 
even try to mention all of them. Our emphasis is on methodology, and we thus try to 
describe the ideas, and not always to give the best possible results if these are too 
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xiv PREFACE 


technical to allow a clear presentation. Many of the results are asymptotic, and we 
use the standard asymptotic notation: for two functions fand g, we write f = O(g) if 
J = cg for all sufficiently large values of the variables of the two functions, where c 
is an absolute positive constant. We write f = O(g) if g = O(/) and f = O(g) if 
f= O(g) and f = O(g). If the limit of the ratio f/g tends to zero as the variables 
of the functions tend to infinity we write f = o(g). Finally, f~ g denotes that f = 
(1 + o(1))g; that is, f/g tends to 1 when the variables tend to infinity. Each chapter 
ends with a list of exercises. The more difficult ones are marked by (*). The exercis- 
es enable readers to check their understanding of the material and also provide the 
possibility of using the book as a textbook. 

This is the third edition of the book; it contains several improved results and cov- 
ers various additional topics that developed extensively during the last few years. 
The additions include a modern treatment of the Erdés—Rényi phase transition dis- 
cussed in Chapter 11, focusing on the behavior of the random graph near the emer- 
gence of the giant component and briefly exploring its connection to classical per- 
colation theory. Another addition is Chapter 17, Graph Property Testing—a recent 
topic that combines combinatorial, probabilistic and algorithmic techniques. This 
chapter also includes a proof of the Regularity Lemma of Szemerédi (described in a 
probabilistic language) and a presentation of some of its applications in the area. 
Further additions are two new Probabilistic Lenses, several additional exercises, 
and a new part in Appendix A focused on lower bounds. 

It is a special pleasure to thank our wives, Nurit and Mary Ann. Their patience, 
understanding and encouragement have been key ingredients in the success of this 
enterprise. 


Noca ALON 
JoEL H. SPENCER 
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The Basic Method 


What you need is that your brain is open. 
— Paul Erdos 


1.1 THE PROBABILISTIC METHOD 


The probabilistic method is a powerful tool for tackling many problems in discrete 
mathematics. Roughly speaking, the method works as follows: Trying to prove that a 
structure with certain desired properties exists, one defines an appropriate probability 
space of structures and then shows that the desired properties hold in this structures 
and then shows that the desired properties hold in this space with positive probability. 
The method is best illustrated by examples. Here is a simple one. The Ramsey 
number R(k, €) is the smallest integer n such that in any two-coloring of the edges 
of a complete graph on n vertices K,, by red and blue, either there is a red Ky, (i.e., 
a complete subgraph on k vertices all of whose edges are colored red) or there is a 
blue Kz. Ramsey (1929) showed that R(k, £) is finite for any two integers & and 2@. 
Let us obtain a lower bound for the diagonal Ramsey numbers R(k, k). 


Proposition 1.1.1 /f (7) .2!-(3) < 1 then R(k,k) > n. Thus R(k, k) > |2*/2] for 
allk > 3. 
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Proof. Consider a random two-coloring of the edges of K,, obtained by coloring 
each edge independently either red or blue, where each color is equally likely. For 
any fixed set R of k vertices, let Ar be the event that the induced subgraph of K,, on 
Ris monochromatic (i.e., that either all its edges are red or they are all blue). Clearly, 
Pr[Apr] = 2'-(2). Since there are (R) possible choices for R, the probability 
that at least one of the events Ar occurs is at most (2)2'-G) < 1. Thus, with 
positive probability, no event A occurs and there is a two-coloring of K,, without a 


monochromatic K,; that is, R(k,k) > n. Note thatif k > 3 and we take n = [2/4 


then : 
Dei 2 ere: spk 
(,)2 ) <a <1 
and hence R(k,k) > |2*/?| for all k > 3. a 


This simple example demonstrates the essence of the probabilistic method. To 
prove the existence of a good coloring we do not present one explicitly, but rather 
show, in a nonconstructive way, that it exists. This example appeared in a paper of 
P. Erdés from 1947. Although Szele had applied the probabilistic method to another 
combinatorial problem, mentioned in Chapter 2, already in 1943, Erd6s was certainly 
the first one who understood the full power of this method and applied it successfully 
over the years to numerous problems. One can, of course, claim that the probability 
is not essential in the proof given above. An equally simple proof can be described 
by counting; we just check that the total number of two-colorings of A, is larger 
than the number of those containing a monochromatic Kx. 

Moreover, since the vast majority of the probability spaces considered in the 
study of combinatorial problems are finite spaces, this claim applies to most of the 
applications of the probabilistic method in discrete mathematics. Theoretically, this 
is, indeed, the case. However, in practice, the probability is essential. It would 
be hopeless to replace the applications of many of the tools appearing in this book, 
including, for example, the second moment method, the Lovasz Local Lemma and the 
concentration via martingales by counting arguments, even when these are applied 
to finite probability spaces. 

The probabilistic method has an interesting algorithmic aspect. Consider, for 
example, the proof of Proposition 1.1.1 that shows that there is an edge two-coloring 
of K,, without a monochromatic K9 jog, n- Can we actually find such a coloring? 
This question, as asked, may sound ridiculous; the total number of possible colorings 
is finite, so we can try them all until] we find the desired one. However, such a 
procedure may require 9(3) steps; an amount of time that is exponential in the size 
[= (3)] of the problem. Algorithms whose running time is more than polynomial in 
the size of the problem are usually considered impractical. The class of problems that 
can be solved in polynomial time, usually denoted by P [see, e.g., Aho, Hopcroft and 
Ullman (1974)], is, in a sense, the class of all solvable problems. In this sense, the 
exhaustive search approach suggested above for finding a good coloring of K,, is not 
acceptable, and this is the reason for our remark that the proof of Proposition 1.1.1 is 
nonconstructive; it does not supply a constructive, efficient and deterministic way of 
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producing a coloring with the desired properties. However, a closer look at the proof 
shows that, in fact, it can be used to produce, effectively, a coloring that is very likely 
to be good. This is because for large k, ifn = [ake -| then 


91+8 g1+% 


n 1-(* nN \k 
(Gy) 2 <r Gal Sr = 


Hence, a random coloring of K,, is very likely not to contain a monochromatic 
Kaitogn. This means that if, for some reason, we must present a two-coloring 
of the edges of K 924 without a monochromatic K29 we can simply produce a 
random two-coloring by flipping a fair coin Co) times. We can then deliver the 
resulting coloring safely; the probability that it contains a monochromatic Ky is 
less than 2!!/20!, probably much smaller than our chances of making a mistake in 
any rigorous proof that a certain coloring is good! Therefore, in some cases the 
probabilistic, nonconstructive method does supply effective probabilistic algorithms. 
Moreover, these algorithms can sometimes be converted into deterministic ones. This 
topic is discussed in some detail in Chapter 16. 

The probabilistic method is a powerful tool in Combinatorics and in Graph Theory. 
It is also extremely useful in Number Theory and in Combinatorial Geometry. More 
recently, it has been applied in the development of efficient algorithmic techniques 
and in the study of various computational problems. In the rest of this chapter 
we present several simple examples that demonstrate some of the broad spectrum 
of topics in which this method is helpful. More complicated examples, involving 
various more delicate probabilistic arguments, appear in the rest of the book. 


1.2 GRAPH THEORY 


A tournament on a set V of n players is an orientation T = (V, E) of the edges of the 
complete graph on the set of vertices V. Thus for every two distinct elements x and 
y of V either (x, y) or (y, x) is in E, but not both. The name “tournament” is natural, 
since one can think of the set V as a set of players in which each pair participates in 
a single match, where (2, y) is in the tournament iff x beats y. We say that T has the 
property 5; if for every set of k players there is one who beats them all. For example, a 
directed triangle T; = (V, E), where V = {1, 2,3} and E = {(1, 2), (2,3), (3, 1)}, 
has $1. Is it true that for every finite k there is atournament T (on more than k vertices) 
with the property S;? As shown by Erdos (1963b), this problem, raised by Schiitte, 
can be solved almost trivially by applying probabilistic arguments. Moreover, these 
arguments even supply a rather sharp estimate for the minimum possible number of 
vertices in such a tournament. The basic (and natural) idea is that if n is sufficiently 
large as a function of k, then a random tournament on the set V = {1,...,n} of n 
players is very likely to have property S;. By a random tournament we mean here a 
tournament T' on V obtained by choosing, for each 1 < 7 < 7 < n, independently, 
either the edge (7,7) or the edge (j,i), where each of these two choices is equally 


likely. Observe that in this manner, all the 9(2) possible tournaments on V are 
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equally likely; that is, the probability space considered is symmetric. It is worth 
noting that we often use in applications symmetric probability spaces. In these cases, 
we shall sometimes refer to an element of the space as a random element, without 
describing explicitly the probability distribution. Thus, for example, in the proof of 
Proposition 1.1.1 random two-colorings of K,, were considered; that is, all possible 
colorings were equally likely. Similarly, in the proof of the next simple result we 
study random tournaments on V. 


Theorem 1.2.1 /f (7)(1 —27*)"-* < 1 then there is a tournament on n vertices 
that has the property Sx. 


Proof. Consider a random tournament on the set V = {1,...,n}. For every fixed 
subset K of size k of V, let Ax be the event that there is no vertex that beats all 
the members of K. Clearly Pr{[Ax] = (1 — 2~*)"—*. This is because for each 
fixed vertex v € V — K, the probability that v does not beat all the members of K' is 
1 — 2-*, and all these n — k events corresponding to the various possible choices of 
v are independent. It follows that 


Pr| VV Ax} < So Pr{Ax] = (j)a-2tyr < i: 


KCV KCV 

|Kl|=k |K|=k 
Therefore, with positive probability, no event Ax occurs; that is, there is a tournament 
on 7 vertices that has the property Sx. a 


Let f(k) denote the minimum possible number of vertices of a tournament that 
has the property S,. Since (7%) < (en/k)* and (1 — 27*)""* < ek) /2F 
Theorem 1.2.1 implies that f(k) < k? - 2* - (In2)(1 + o(1)). It is not too difficult 
to check that f(1) = 3 and f(2) = 7. As proved by Szekeres [cf. Moon (1968)], 
f(k) > cy: k- Pie 

Can one find an explicit construction of tournaments with at most c§ vertices 
having property S;,? Such a construction is known but is not trivial; it is described in 
Chapter 9. 

A dominating set of an undirected graph G = (V, E) is a set U C V such that 
every vertex v € V — U has at least one neighbor in U. 


Theorem 1.2.2 Let G = (V,E) be a graph on n vertices, with minimum degree 
6 > 1. Then G has a dominating set of at most n{1 + In(6 + 1)]/(6 + 1) vertices. 


Proof. Let p € [0,1] be, for the moment, arbitrary. Let us pick, randomly and 
independently, each vertex of V with probability p. Let X be the (random) set of all 
vertices picked and let Y = Yx be the random set of all vertices in V — X that do not 
have any neighbor in X. The expected value of |X | is clearly np. For each fixed vertex 
v € V, Pr[v € Y] = Pr[v and its neighbors are not in X] < (1 — p)°+?. Since the 
expected value of a sum of random variables is the sum of their expectations (even 
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if they are not independent) and since the random variable |Y| can be written as 
a sum of n indicator random variables y, (v € V), where xy, = 1 if v € Y and 
Xv = 0 otherwise, we conclude that the expected value of |X| + |Y| is at most 
np + n(1— p)®+1. Consequently, there is at least one choice of X C V such that 
|X| + |Y¥x| < np+n(1 —p)?*+!. The set U = X UYx is clearly a dominating set 
of G whose cardinality is at most this size. 

The above argument works for any p € [0,1]. To optimize the result we use 
elementary calculus. For convenience we bound 1 - p < e~? (this holds for all 
nonnegative p and is a fairly close bound when p is small) to give the simpler bound 


|U| < np + ne POTD | 


Take the derivative of the right-hand side with respect to p and set it equal to zero. 
The right-hand side is minimized at 


_ In(6 +1) 
= Orel 


Formally, we set p equal to this value in the first line of the proof. We now have 
|U| < n{1 + In(é6 + 1)]/(6 + 1) as claimed. a 


Three simple but important ideas are incorporated in the last proof. The first is 
the linearity of expectation; many applications of this simple, yet powerful principle 
appear in Chapter 2. The second is perhaps more subtle and is an example of the 
“alteration” principle that is discussed in Chapter 3. The random choice did not 
supply the required dominating set U immediately; it only supplied the set X, which 
has to be altered a little (by adding to it the set Yx ) to provide the required dominating 
set. The third involves the optimal choice of p. One often wants to make a random 
choice but is not certain what probability p should be used. The idea is to carry out 
the proof with p as a parameter giving a result that is a function of p. At the end, that 
pis selected which gives the optimal result. There is here yet a fourth idea that might 
be called asymptotic calculus. We wanted the asymptotics of min np + n(1 —p)>t?, 
where p ranges over [0,1]. The actual minimum p = 1 — (6 + 1)~!/° is difficult 
to deal with and in many similar cases precise minima are impossible to find in 
closed form. Rather, we give away a little bit, bounding 1 — p < e7?, yielding 
a clean bound. A good part of the art of the probabilistic method lies in finding 
suboptimal but clean bounds. Did we give away too much in this case? The answer 
depends on the emphasis for the original question. For 6 = 3 our rough bound gives 
|U| < 0.596n while the more precise calculation gives |U! < 0.496n, perhaps a 
substantial difference. For 6 large both methods give asymptotically n In 6/6. 

It can easily be deduced from the results in Alon (1990b) that the bound in 
Theorem 1.2.2 is nearly optimal. A non probabilistic, algorithmic proof of this 
theorem can be obtained by choosing the vertices for the dominating set one by 
one, when in each step a vertex that covers the maximum number of yet uncovered 
vertices is picked. Indeed, for each vertex v denote by C(v) the set consisting of v 
together with all its neighbors. Suppose that during the process of picking vertices 
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the number of vertices u that do not lie in the union of the sets C'(v) of the vertices 
chosen so far is r. By the assumption, the sum of the cardinalities of the sets C'(u) 
over all such uncovered vertices wu is at least r(6 + 1), and hence, by averaging, there 
is a vertex v that belongs to at least r(d + 1)/n such sets C(u). Adding this v to 
the set of chosen vertices we observe that the number of uncovered vertices is now 
at most r(1 — (6 + 1)/n). It follows that in each iteration of the above procedure the 
number of uncovered vertices decreases by a factor of 1 — (6 + 1)/n and hence after 
nIn(é + 1)/(6 + 1) steps there will be at most n/(d + 1) yet uncovered vertices that 
can now be added to the set of chosen vertices to form a dominating set of size at 
most equal to the one in the conclusion of Theorem 1.2.2. 

Combining this with some ideas of Podderyugin and Matula, we can obtain a very 
efficient algorithm to decide if a given undirected graph on n vertices is, say, n/2 
edge-connected. A cut in a graph G = (V, F) is a partition of the set of vertices V 
into two nonempty disjoint sets V = V| U Vo. If v1; € Vy and ve € Vo we say that 
the cut separates v; and v2. The size of the cut is the number of edges of G having 
one end in V, and another end in V2. In fact, we sometimes identify the cut with the 
set of these edges. The edge connectivity of G is the minimum size of a cut of G. 
The following lemma is due to Podderyugin and Matula (independently). 


Lemma 1.2.3 LetG = (V, FE) be a graph with minimum degree 6 and let V = V,UV2 
be a cut of size smaller than 6 in G. Then every dominating set U of G has vertices 
in V; and in Vo. 


Proof. Suppose this is false and U C V;. Choose, arbitrarily, a vertex v € V2 and 
let v1, V2,..., Us be 6 of its neighbors. For each i, 1 < i < 6, define an edge e; of 
the given cut as follows; if v; € V; then e; = {v,v;}, otherwise, v; € V2 and since 
U is dominating there is at least one vertex u € U such that {u, v;} is an edge; take 


such a u and put e; = {u,v;}. The 6 edges e),...,e 5 are all distinct and all lie in 
the given cut, contradicting the assumption that its size is less than 6. This completes 
the proof. a 


Let G = (V,£) be a graph on n vertices, and suppose we wish to decide if 
G is n/2 edge-connected; that is, if its edge connectivity is at least n/2. Matula 
showed, by applying Lemma 1.2.3, that this can be done in time O(n?). By the 
remark following the proof of Theorem 1.2.2, we can slightly improve it and get 
an O(n®/ log n) algorithm as follows. We first check if the minimum degree 6 of 
G is at least n/2. If not, G is not n/2 edge-connected, and the algorithm ends. 
Otherwise, by Theorem 1.2.2 there is a dominating set U = {u1,...,uz} of G, 
where k = O(log), and it can in fact be found in time O(n”). We now find, for 
each i, 2 <i < k, the minimum size s; of a cut that separates u; from u;. Each of 
these problems can be solved by solving a standard network flow problem in time 
O(n8/3) [see, e.g., Tarjan (1983)]. By Lemma 1.2.3 the edge connectivity of G is 
simply the minimum between 6 and ming<;<, $;. The total time of the algorithm is 
O(n®/3 log n), as claimed. 
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1.3 COMBINATORICS 


A hypergraph is a pair H = (V, E), where V is a finite set whose elements are called 
vertices and E is a family of subsets of V, called edges. It is n-uniform if each of 
its edges contains precisely n vertices. We say that H has property B, or that it is 
two-colorable if there is a two-coloring of V such that no edge is monochromatic. 
Let m(n) denote the minimum possible number of edges of an n-uniform hypergraph 
that does not have property B. 


Proposition 1.3.1 [Erdés (1963a)] Every n-uniform hypergraph with less than 2”—! 
edges has property B. Therefore m(n) > 2-3. 


Proof. Let H = (V,£) be an n-uniform hypergraph with less than 2”~! edges. 
Color V randomly by two colors. For each edge e € E, let A, be the event that e is 
monochromatic. Clearly Pr [A.] = 2'~”. Therefore 


VV 4, <> Pr[A-] <1 


e€E ecE 


Pr 


and there is a two-coloring without monochromatic edges. | 


In Section 3.5 we present a more delicate argument, due to Radhakrishnan and 
Srinivasan, and based on an idea of Beck, that shows that 


te) =O (as 2") 


The best known upper bound to m(n) is found by turning the probabilistic argu- 
ment “on its head.” Basically, the sets become random and each coloring defines an 
event. Fix V with v points, where we shall later optimize v. Let y be a coloring of V 
with @ points in one color, 6 = v — a points in the other. Let S C V be a uniformly 
selected n-set. Then 


(+0) 


Pr [S is monochromatic under x] = 


Let us assume wv is even for convenience. As (2) is convex, this expression is 
minimized when a = b. Thus 


Pr [S is monochromatic under y] > p, 


where we set 
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for notational convenience. Now let $),..., 5; be uniformly and independently 
chosen n-sets, m to be determined. For each coloring x let A, be the event that none 
of the S; are monochromatic. By the independence of the S; 


Pr[A,] < (1-p)”. 


There are 2” colorings so 


Pr lV Ay| < 2°(1—p)™ 
When this quantity is less than 1 there exist S),..., Sm so that no A, holds; that is, 
Si,...,;5m is not two-colorable and hence m(n) < m. 

The asymptotics provide a fairly typical example of those encountered when 
employing the probabilistic method. We first use the inequality 1 — p < e~?. This 
is valid for all positive p and the terms are quite close when p is small. When 


? In | 
m= 
Pp 

then 2°(1 — p)™ < 2%e-?™ < 1s0m(n) < m. Now we need to find v to minimize 
u/p. We may interpret p as twice the probability of picking n white balls from 
an um with v/2 white and v/2 black balls, sampling without replacement. It is 
tempting to estimate p by 2~"*!, the probability for sampling with replacement. 
This approximation would yield m ~ v2"~1(1In2). As v gets smaller, however, the 


approximation becomes less accurate and, as we wish to minimize m, the trade-off 
becomes essential. We use a second order approximation 


2(°/?) n-1) 2i 


—n Vv 1—n,—n? /2u 
p= 8+ =2'"]] ~~ Qi-Re 
(*) i=0 unt 
as long as v > n°/?, estimating 
v—-i v v 


Elementary calculus gives v = n7/2 for the optimal value. The evenness of v may 
require a change of at most 2, which turns out to be asymptotically negligible. This 
yields the following result of Erdés (1964). 


Theorem 1.3.2 m(n) < (1+ aye = ior 


Let F = {(A;, B;)}%_, be a family of pairs of subsets of an arbitrary set. We 
call F a (k, £)-system if |A;| = k and |B;| = @ for alll <i < h, A;NB =O 
and A; 1 B; # 9 for all distinct 7, 7 with 1 < i,7 < h. Bollobds (1965) proved the 
following result, which has many interesting extensions and applications. 


COMBINATORIAL NUMBER THEORY 9 


Theorem 1.3.3 If F = {(Ai, Bi}. is a (k, £)-system then h < (*i°). 


Proof. Put X = We; (A; U B,) and consider a random order x of X. For each i, 
1<i<h, let X; be the event that all the elements of A; precede all those of B; in 
this order. Clearly Pr [X;] = 1 } Ce). It is also easy to check that the events X; 
are pairwise disjoint. Indeed, assume this is false and let 7 be an order in which all 
the elements of A; precede those of B; and all the elements of A; precede those of 
B;. Without loss of generality we may assume that the last element of A; does not 
appear after the last element of A;. But in this case, all elements of A; precede all 
those of B;, contradicting the fact that A; 7 B; ¢ 0. Therefore all the events X; are 
pairwise disjoint, as claimed. It follows that 


h 


1>Pr IV x] = SoPrix =u (*F9, 


i=1 


completing the proof. | 


Theorem 1.3.3 is sharp, as shown by the family F = {(A, X\ A): AC X,|A| = 
k}, where X = {1,2,...,4 +2}. 


1.4 COMBINATORIAL NUMBER THEORY 


A subset A of an abelian group G is called sum-free if (A + A)M A = 9; that is, if 
there are no @), @2,a3 € A such that a, + a2 = a3. 


Theorem 1.4.1 [Erdos (1965a)] Every set B = {bi,..., bn} of n nonzero integers 
contains a sum-free subset A of size |A| > 3n. 


Proof. Let p = 3k + 2 be a prime, which satisfies p > 2max{|b;|}7_, and put 
C={k+1,k+2,...,2k+ 1}. Observe that C is a sum-free subset of the cyclic 
group Z, and that 
IC|  k+1 1 
p—-1 3k4+1 as 

Let us choose at random an integer x, 1 < x < p, according to a uniform distribution 
on {1,2,...,p — 1}, and define d),...,d, by d; = xb; (mod p),0 < dj < p. 
Trivially, for every fixed 7,1 <7 <n, as x ranges over all numbers 1, 2,...,p—1, d; 
ranges over all nonzero elements of Z, and hence Pr [d; € C] = |C|/(p — 1) > 3. 
Therefore the expected number of elements 6; such that d; € C’ is more than n/3. 
Consequently, there is an z, 1 < « < panda subsequence A of B of cardinality 
|A| > n/3, such that ra (mod p) € C for alla € A. This A is clearly sum-free, 
since if a, + a2 = a3 for some aj, a2,a3 € A then za; + rap = raz (mod p), 
contradicting the fact that C' is a sum-free subset of Z,. This completes the proof. 
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Remark. The above proof works whenever p is a prime that does not divide any of 
the numbers b;. This can be used to design an efficient deterministic algorithm for 
finding a sum-free subset A of size bigger than |B|/3 in a given set B as above. In 
Alon and Kleitman (1990) it is shown that every set of n nonzero elements of an 
arbitrary abelian group contains a sum-free subset of more than 2n/7 elements, and 
that the constant 2/7 is best possible. The best possible constant in Theorem 1.4.1 is 
not known. 


1.5 DISJOINT PAIRS 


The probabilistic method is most striking when it is applied to prove theorems whose 
statement does not seem to suggest at all the need for probability. Most of the 
examples given in the previous sections are simple instances of such statements. In 
this section we describe a (slightly) more complicated result, due to Alon and Frankl 
(1985), which solves a conjecture of Daykin and Erdos. 

Let F be a family of m distinct subsets of X = {1,2,...,n}. Let d(F) denote 
the number of disjoint pairs in F’; that is, 


UF) =|{{F FP}: EP €F, FOF =9}|. 


Daykin and Erdds conjectured that if m = 2(/2+4)”, then, for every fixed 5 > 0, 
d(F) = o(m?), as ntends to infinity. This result follows from the following theorem, 
which is a special case of a more general result. 


Theorem 1.5.1 Let F be a family of m = 2(/2+9)” subsets of X = {1,2,...,n}, 
where 6 > 0. Then 
d(F) < m?-?/?, (1.1) 


Proof. Suppose (1.1) is false and pick independently t members A,, Ao,..., Az of F 
with repetitions at random, where t is a large positive integer, to be chosen later. We 
will show that with positive probability |A, U Ag U---U A;| > n/2 and still this 
union is disjoint to more than 2"/? distinct subsets of X. This contradiction will 
establish (1.1). 7 


In fact, 
Pr [|A, UAgU---U Atl < n/2] 
SoS Pr Aveo = Lge ol (1.2) 
SCX 
|S|=n/2 
a OPQ parent =, gn(1—dt) : 
Define 


o(B) =|{AEF: BNA=9H}]. 
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Clearly, 
S_ u(B) = 2d(F) > 2m?-F/?, 
BEF 


Let Y be a random variable whose value is the number of members B € F that are 
disjoint to all the A; (1 <i < t). By the convexity of z' the expected value of Y 


satisfies 
vo ECP) ~ ae Eat 


BEF 
t 
2 as -m (22) > Qmi- 8/2 | 
m m 


Since Y < m we conclude that 
Pr ly > nee) > m7), (1.3) 


One can check that for t = [1+ 1/6], m!~*°/2 > 2”/2 and the right-hand side 
of (1.3) is greater than the right-hand side of (1.2). Thus, with positive probability, 
|A, U Ag U-+-U A;| > n/2 and still this union is disjoint to more than 2”/? members 
of F’, This contradiction implies inequality (1.1). | 


1.6 EXERCISES 


1. Prove that if there is a real p, 0 < p < 1 such that 


(j,)0 + (")a ated. 


then the Ramsey number #(k, t) satisfies R(k,t) > n. Using this, show that 


R(4,t) > Q(t8/2 /(nt)3/?). 


2. Suppose n > 4 and let H be an n-uniform hypergraph with at most 4"~1/3” 
edges. Prove that there is a coloring of the vertices of H by four colors so that 
in every edge all four colors are represented. 


3. (*) Prove that for every two independent, identically distributed real random 
variables X and Y, 


Pr [|X —Y| < 2] < 3Pr[|X —Y|< 1]. 


4, (*) Let G = (V, E) be a graph with n vertices and minimum degree 6 > 10. 
Prove that there is a partition of V into two disjoint subsets A and B so that 


12 
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|A] < O(n ln 6/6), and each vertex of B has at least one neighbor in A and at 
least one neighbor in B. 


. (*) Let G = (V, E) be a graph on n > 10 vertices and suppose that if we add 


to G any edge not in G' then the number of copies of a complete graph on 10 
vertices in it increases. Show that the number of edges of G is at least 8n — 36. 


. (*) Theorem 1.2.1 asserts that for every integer k > 0 there is a tournament 


Ty = (V, E) with |V| > k such that for every set U of at most k vertices of 
T; there is a vertex v so that all directed arcs {(v,u) : u € U} are in EF. Show 
that each such tournament contains at least (2) vertices. 


. Let {(A;, B;),1 < i < h} be a family of pairs of subsets of the set of 


integers such that |A;| = & for all i and |B;| = / for all i, A; MN B; = O and 
(A; B;) U(A; 9 B;) 0 for all i 4 j. Prove that h < (k + 1)*4'/ (kL). 


. (Prefix-free codes; Kraft Inequality). Let F’ be a finite collection of binary 


strings of finite lengths and assume no member of F is a prefix of another one. 
Let N; denote the number of strings of length 2 in Ff’. Prove that 


peer 


a 


. (*) (Uniquely decipherable codes; Kraft-McMillan Inequality). Let F’ be a 


finite collection of binary strings of finite lengths and assume that no two 
distinct concatenations of two finite sequences of codewords result in the same 
binary sequence. Let N; denote the number of strings of length 7 in F’. Prove 


that 
N; 
S- ee >, 
Qe 


a 


. Prove that there is an absolute constant c > 0 with the following property. 


Let A be an n by n matrix with pairwise distinct entries. Then there is 
a permutation of the rows of A so that no column in the permuted matrix 
contains an increasing subsequence of length at least c,/n. 


THE PROBABILISTIC LENS: 


The Erdos—Ko—Rado 
Theorem 


A family F¥ of sets is called intersecting if A, B € F implies AN B 4 9. Suppose 
n > 2k and let F be an intersecting family of k-element subsets of an n-set, for 
definiteness {0,...,n — 1}. The Erdés—Ko-Rado Theorem is that |F| < aes ; 
This is achievable by taking the family of k-sets containing a particular point. We 
give a short proof due to Katona (1972). 


Lemma 1 For0<s<n-—J1set A, = {s,8+1,...,8+k—1} where addition is 
modulo n. Then F can contain at most k of the sets Ag. 


Proof. Fix some A, € ¥. All other sets A; that intersect A, can be partitioned into 
k —1 pairs {A,_;, Astp—i}, (1 < 4 < k—1), and the members of each such pair are 
disjoint. The result follows, since ¥ can contain at most one member of each pair. Ml 


Now we prove the Erdds—Ko-Rado Theorem. Let a permutation o of {0,...,n— 
1} andi € {0,...,—1} be chosen randomly, uniformly and independently and set 
A= {o(t),o(i+1),...,0(¢ +k —1)}, addition again modulo n. Conditioning on 
any choice of o the lemma gives Pr[A € F] < k/n. Hence Pr[A € F] < k/n. But 
A is uniformly chosen from all k-sets so 


Kspiseria 
n 


(i) 


set) = (roa): 


and 
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Linearity of Expectation 


The search for truth is more precious than its possession. 
— Albert Einstein 


2.1 BASICS 


Let X,...,X, be random variables, X = cy)X; +--+ +c¢,Xy. Linearity of 
expectation states that 


E[X] = ca E[Xi] +---+cnE [Xp] . 


The power of this principle comes from there being no restrictions on the dependence 
or independence of the X;. In many instances E |X] can easily be calculated by a 
judicious decomposition into simple (often indicator) random variables X;. 

Let o be a random permutation on {1,...,}, uniformly chosen. Let X(c) be 
the number of fixed points of o. To find E|X| we decompose X = X, +--+ Xy 
where X;; is the indicator random variable of the event o(i) = i. Then 


B[Xi] = Pr[o(#) =’) = + 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
Copyright © 2008 John Wiley & Sons, Inc. 
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so that , ; 
E[X]=—-—+---+-=1. 
n n 


In applications we often use that there is a point in the probability space for which 
X > E[X|] and a point for which X < E[X]. We have selected results with a 
purpose of describing this basic methodology. The following result of Szele (1943) 
is oftentimes considered the first use of the probabilistic method. 


Theorem 2.1.1 There is a tournament T with n players and at least nig- (1) 
Hamiltonian paths. 


Proof. In the random tournament let X be the number of Hamiltonian paths. For each 
permutation o let X, be the indicator random variable for o giving a Hamiltonian 
path; that is, satisfying (o(%), o(é + 1)) € T for | <i <n. Then X = )> X, and 


EX) Se Bixee ei 
Thus some tournament has at least E |X] Hamiltonian paths. a 


Szele conjectured that the maximum possible number of Hamiltonian paths in a 
tournament on n players is at most n!/(2— 0(1))”. This was proved in Alon (1990a) 
and is presented in The Probabilistic Lens: Hamiltonian Paths (following Chapter 4). 


2.2 SPLITTING GRAPHS 


Theorem 2.2.1 Let G = (V, E) be a graph with n vertices and e edges. Then G 
contains a bipartite subgraph with at least e/2 edges. 


Proof. Let T C V be a random subset given by Pr [xz € T] = 1/2, these choices 
being mutually independent. Set B = V — T. Call an edge {z, y} crossing if exactly 
one of x, y is in T’. Let X be the number of crossing edges. We decompose 


X= So Xy: 


{x,yJEE 


where X,,, is the indicator random variable for {z, y} being crossing. Then 


1 
E [Xay] == 9 


as two fair coin flips have probability 1/2 of being different. Then 
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Thus X > e/2 for some choice of T and the set of those crossing edges form a 
bipartite graph. a 


A more subtle probability space gives a small improvement (which is tight for 
complete graphs). 


Theorem 2.2.2 IfG has 2n vertices and e edges then it contains a bipartite subgraph 
with at least en/(2n —1) edges. If G has 2n+ 1 vertices and e edges then it contains 
a bipartite subgraph with at least e(n + 1)/2n + 1 edges. 


Proof. When G has 2n vertices let T be chosen uniformly from among all n-element 
subsets of V. Any edge {x,y} now has probability n/(2n — 1) of being crossing 
and the proof concludes as before. When G has 2n + 1 vertices choose T uniformly 
from among all n-element subsets of V and the proof is similar. a 


Here is a more complicated example in which the choice of distribution requires 
a preliminary lemma. Let V = V, U---U V;, where the V; are disjoint sets of size 
n. Leth: V* — {+1} be a two-coloring of the k-sets. A k-set FE is crossing if it 
contains precisely one point from each V;. For S C V set h(S) = >> A(E), the sum 
over all k-sets EC S. 


Theorem 2.2.3 Suppose h(E) = +1 for all crossing k-sets E. Then there is an 
S CV for which 
|h(S)| > cyn*. 


Here cx, is a positive constant, independent of n. 
Lemma 2.2.4 Let P; denote the set of all homogeneous polynomials f(pi,...,Dk) 


of degree k with all coefficients having absolute value at most one and p\p2--- Dk 
having coefficient one. Then for all f © Py there exist py,...,pr © [0,1] with 


\f(pi,---,Dr)I > Ck 


Here cy, is positive and independent of f. 


Proof. Set 
M(f)= max |f(Pis--- Pe) 


Pis--yPrE[0,1 


For f € Py, M(f) > 0 as f is not the zero polynomial. As Py, is compact and 
M : P, — Ris continuous, M must assume its minimum cx. | 


Proof [Theorem 2.2.3]. Define a random S C V by setting 


Price S]=p, xzeV,, 


these choices being mutually independent, with p; to be determined. Set X = h(S). 
For each k-set F set 
{ h(E) ifE CS, 
Xp= 


0 otherwise. 
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Say & has type (a1,...,a,) if [EO Vi| = a;, 1 <i <k. For these £, 
E[Xg] = h(E)Pr[E C S] = h(E)pt' --- ppt . 
Combining terms by type 
E[X}= So pf! «pet - h(E). 
ayt--t+ap=k E of type (a1,--,ax) 
When a; =... = a, = Lall h(E) = 1 by assumption so 


> AB) an*. 


E of type (1,...,1) 


For any other type there are fewer than n* terms, each +1, so 


ae. R(E)| < nk. 


E of type (a1,...,a%) 


Thus 
E[X| oe n* f (pr, tee »Pk); 


where f € P,, as defined by Lemma 2.2.4. 
Now select pi,..., px € [0,1] with |f(pi,...,p%)| > ce. Then 


E [|X|] > |E[X]| > cant. 
Some particular value of |X| must exceed or equal its expectation. Hence there is a 


particular set S C V with |X| = |A(S)| > cgn*. | 


Theorem 2.2.3 has an interesting application to Ramsey Theory. It is known [see 
Erdos (1965b)] that given any coloring with two colors of the k-sets of an n-set there 
exist k disjoint m-sets, m = O((Inn)!/(*—)), so that all crossing k-sets are the 
same color. From Theorem 2.2.3 there then exists a set of size O((Inn)!/*-)), at 
least 5 + €, Of whose k-sets are the same color. This is somewhat surprising since it 
is known that there are colorings in which the largest monochromatic set has size at 
most the k — 2-fold logarithm of n. 


2.3 TWO QUICKIES 


Linearity of expectation sometimes gives very quick results. 


Theorem 2.3.1 There is a two-coloring of K,, with at most 


(“)2-@ 
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monochromatic K q. 


Proof [Outline]. Take a random coloring. Let X be the number of monochromatic 
K, and find E[X]. For some coloring the value of X is at most this expectation. 


In Chapter 16 it is shown how such a coloring can be found deterministically and 
efficiently. 


Theorem 2.3.2 There is a two-coloring of Km,» with at most 
m\ (n 
g1—ab 
(2) G) 


Proof [Outline]. Take a random coloring. Let X be the number of monochromatic 
K,» and find E[X]. For some coloring the value of X is at most this expectation. i 


monochromatic Kp. 


2.4 BALANCING VECTORS 
The next result has an elegant non probabilistic proof, which we defer to the end of 


this chapter. Here |v| is the usual Euclidean norm. 


Theorem 2.4.1 Let v1,...,Un € R”, all |v;| = 1. Then there exist €,,...,€n = +1 
so that 
jeqvy +++ + €nVy| < Vn, 


and also there exist €,,...,€n = +1 so that 


jeyv, +++ + €ntn| > Vn. 


Proof. Let €1,..., €n, be selected uniformly and independently from {—1, +1}. Set 
X = /leyvy +--+ eval. 
Then 


n n 
X= ) y Ej 5 Uj + Vj - 


i=1 j=l 


E[X] = 90 S> uy; - vjE fee] - 


i=1 j=l 
When i # j, E [exe;] = E [ei] E [ej] = 0. When i = j, €? = 1 so E [e?] = 1. Thus 


Thus 


E [xX] Sa =n. 
i=1 
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Hence there exist specific €1,...,€, = +1 with X > n and with X <n. Taking 
square roots gives the theorem. a 


The next result includes part of Theorem 2.4.1 as a linear translation of the 
Dy =+++ = Pn = 1/2 case. 


Theorem 2.4.2 Let v;,...,Un € R”, all |v;| < 1. Let pi,...,pn € [0,1] be 
arbitrary and set Ww = piv, +--+ + PnUn. Then there exist €1,...,€n € {0,1} so 
that, setting v = €10, +--+ + €nUn, 


lwo] 


Proof. Pick €; independently with 
Pr leg 1). pj, Pr [e; = 0] = 1—-p;. 
The random choice of €; gives a random v and a random variable 


X =|w—-)’. 


We expand 
mr 2 n n 
X =|S° pi - evil = 0 Sou vl — 4) (95 — ) 

t=1 t=1 g=1 
so that 

E[X} = °° u;- El — 4) (py — &)] - 

i=1 j=l 
For i # j, 
E [(pi — €:)(p; — €;)] = E [pi — ei] E [pj — ej] = 0. 

For i = j, 


E [(pi — €:)”] = pi(pi — 1)? + (1 — pi)? = pill — i) < 


- 


do | 


(E[(p; - &)?| = Var [e;|, the variance to be discussed in Chapter 4.) Thus 
n I Tr zs 
E [Xx] 2 Pill pi)|vsl =4 2 [vil =] 


and the proof concludes as in that of Theorem 2.4.1. | 
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2.5 UNBALANCING LIGHTS 


Theorem 2.5.1 Let aj; = +1 for 1 < i,j <n. Then there exist x,y; = +1, 


1<i,j < nso that 
n n 9 
Sy eye > (V = xa) n2/?, 


a) 


This result has an amusing interpretation. Let ann x n array of lights be given, 
each either on (a;; = +1) or off (a;; = —1). Suppose for each row and each column 
there is a switch so that if the switch is pulled (z; = —1 for row 7 and y; = —1 for 
column j) all of the lights in that line are “switched”: on to off or off to on. Then 
for any initial configuration it is possible to perform switches so that the number of 
lights on minus the number of lights off is at least (./2/m + o(1))n3/?. 


Proof. Forget the x’s. Let y,...,%, = +1 be selected independently and uniformly 


and set - a 
R, = >> ayy; , R=) |Ril. 
j=l i=1 


Fix 7. Regardless of a;;, aij; is £1 with probability 1/2 and their values (over 7) 
are independent; that is, whatever the 2th row is initially after random switching it 
becomes a uniformly distributed row, all 2” possibilities equally likely. Thus R; 
has distribution S,, — the distribution of the sum of n independent uniform {—1, 1} 
random variables — and so 


E (|Ril] = E[|S,|] = (2+ x) Vn. 


These asymptotics may be found by estimating S,, by \/nN, where N is standard 
normal and using elementary calculus. Alternatively, a closed form 


E [|Sal] = n2*-" ea) 


may be derived combinatorially (a problem in the 1974 Putnam competition!) and 
the asymptotics follows from Stirling’s formula. 
Now apply linearity of expectation to R: 


B[R] = ) EIR) = (V2 - x) ni? 


There exist y1,...,Yn = +1 with R at least this value. Finally, pick x; with the 
same sign as R; so that 


Sa we ee lea ( (2+) ni/? 
i=] j=l i=1 i=1 
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Another result on unbalancing lights appears in The Probabilistic Lens: Unbal- 
ancing Lights (following Chapter 13). The existence of Hadamard matrices and 
the discussion in Section 9.1 show that the estimate in the last theorem cannot be 
improved to anything bigger than n3/?. 


2.6 WITHOUT COIN FLIPS 


A non probabilistic proof of Theorem 2.2.1 may be given by placing each vertex in 
either T or B sequentially. At each stage place x in either T or B so that at least half 
of the edges from x to previous vertices are crossing. With this effective algorithm 
at least half the edges will be crossing. 

There is also a simple sequential algorithm for choosing signs in Theorem 2.4.1. 
When the sign for uv; is to be chosen, a partial sum w = €)¥, + --- + &—10;-1 has 
been calculated. Now if it is desired that the sum be small select «¢; = +1 so that 
€;v; makes an obtuse (or right) angle with w. If the sum need be big make the angle 
acute or right. In the extreme case when all angles are right angles, Pythagoras and 
induction give that the final w has norm \/n, otherwise it is either less than ./n or 
greater than ,/n as desired. 

For Theorem 2.4.2 a greedy algorithm produces the desired €;. Given vy,...,Un € 
R”, pi,---,Pn € [0, 1] suppose €),...,€s—-1 € {0, 1} have already been chosen. Set 
Ws-1 = , (p; — €;)v;, the partial sum. Select €, so that 


s 


Ws = We_1 + (Ds — €x)¥e = > _ (pi — €4) U5 
t=1 


has minimal norm. A random e, € {0,1} chosen with Pr [e, = 1] = ps gives 
E [|ws|?] =  |we_1|? + 2we_1 - VeE [ps — €s] + |ve|"E [pe — ele 
Jws—1? + pe(1— Ps)|vs\" 


so for some choice of €, € {0,1}, 


|ws|? < \we—1|7 + ps(1 i Ps) |vs|? : 


As this holds for all 1 < s < n (taking wo = 0), the final 


mr 
lwnl? < So pil — ps) lvil? 
i=l 
While the proofs appear similar, a direct implementation of the proof of Theorem 2.4.2 
to find €,,..., €» might take an exhaustive search with exponential time. In applying 
the greedy algorithm at the sth stage one makes two calculations of |w,|?, depending 
on whether €, = 0 or 1, and picks that €, giving the smaller value. Hence there are 
only a linear number of calculations of norms to be made and the entire algorithm 
takes only quadratic time. In Chapter 16 we discuss several similar examples in a 
more general setting. 


2.7 


EXERCISES 23 


EXERCISES 


. Suppose n > 2 and let H = (V, E) be an n-uniform hypergraph with |F| = 


4”—1 edges. Show that there is a coloring of V by four colors so that no edge 
is monochromatic. 


. Prove that there is a positive constant c so that every set A of n nonzero reals 


contains a subset B C A of size |B] > cn so that there are no 61, b2, 63, b4 € B 
satisfying 
by + 2be = 2b3 + 2b4 . 


. Prove that every set of n non zero real numbers contains a subset A of strictly 


more than 7/3 numbers such that there are no aj,@2,a3 € A satisfying 
a, + a2 = Q3. 


. Suppose p > n > 10m?, with p prime, and let 0 < a1 < a2,< ++: <@m <p 


be integers. Prove that there is an integer x, 0 < x < p for which the m 
numbers 
(xa; mod p) mod n, l<i<m 


are pairwise distinct. 


. Let H be a graph, and let n > |V(H)| be an integer. Suppose there is a 


graph on n vertices and t edges containing no copy of H, and suppose that 
tk > n? log. n. Show that there is a coloring of the edges of the complete 
graph on n vertices by & colors with no monochromatic copy of H. 


. (*) Prove, using the technique shown in The Probabilistic Lens: Hamiltonian 


Paths, that there is aconstant c > 0 such that for every even n > 4 the following 
holds: For every undirected complete graph K on n vertices whose edges are 
colored red and blue, the number of alternating Hamiltonian cycles in K (i.e., 
properly edge-colored cycles of length n) is at most 
ni 
st ¢ 


. Let F be a family of subsets of N = {1,2,...,2}, and suppose there are no 


A,B € F satisfying A C B. Leto € S,, be a random permutation of the 
elements of N and consider the random variable 


X = |{i: {o(1),0(2),...,0(@)} € F}]. 
By considering the expectation of X prove that |F| < (in/2 | i 


. (*) Let X bea collection of pairwise orthogonal unit vectors in R” and suppose 


the projection of each of these vectors on the first k coordinates is of Euclidean 
norm at least e. Show that |X| < k/e?, and this is tight for all «2 = k/2" < 1. 


. Let G = (V, E) be a bipartite graph with n vertices and a list S(v) of more 


than logy n colors associated with each vertex v € V. Prove that there is a 
proper coloring of G assigning to each vertex v a color from its list S(v). 


THE PROBABILISTIC LENS: 
Brégman’s Theorem 


Let A = [a;;] be an n x n matrix with all aj; € {0,1}. Letri = Do) cj<, aig be 
the number of ones in the zth row. Let S' be the set of permutations 0 € S,, with 
Qi,oi = 1 for 1 <i <n. Then the permanent per(A) is simply |S|. The following 
result was conjectured by Minc and proved by Brégman (1973). The proof presented 
here is similar to that of Schrijver (1978). 


Theorem 1 [Brégman’s Theorem] per(A) < II (rt). 
1<i<n 


Pick o € S and rt € S,, independently and uniformly. Set A“) = A. Let Ry1 
be the number of ones in row 71 in A“), Delete row 71 and column o71 from A“) 
to give A(?), In general, let A® denote A with rows 71,...,7(i — 1) and columis 
o7l1,...,a7(i — 1) deleted and let 2; denote the number of ones of row 77 in A®) 
(This is nonzero as the arith column has a one.) Set 


L=L(o,r)= |[ Rri- 


l<i<n 


We think, roughly, of L as Lazyman’s permanent calculation. There are R;1 
choices for a one in row 71, each of which leads to a different subpermanent cal- 
culation. Instead, Lazyman takes the factor #,;, takes the one from permutation o, 
and examines A‘?), As o € S is chosen uniformly Lazyman tends toward the high 
subpermanents and so it should not be surprising that he tends to overestimate the 
permanent. To make this precise we define the geometric mean G[Y]. If Y > 0 takes 
values a;,...,@; with probabilities p,,...,ps, respectively, then G[Y] = ]] a¥’. 
Equivalently, G[Y] = ef!" ¥1. Linearity of expectation translates into the geometric 
mean of a product being the product of the geometric means. 


24 
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Claim 2.7.1 per(A) < G[L]. 


Proof. We show this for any fixed 7. Set 71 = 1 for convenience of notation. We use 
induction on the size of the matrix. Reorder, for convenience, so that the first row has 
ones in the first r columns, where r = r1. For 1 < j <r let t; be the permanent of 
A with the first row and jth column removed or, equivalently, the number of o € S 
with o1 = j. Set 
ty +--+ +t, 

qs 


famed 


so that per(A) = rt. Conditioning on ol = j, Ro--- R,, is Lazyman’s calculation of 
per(A’?)), where A{?) is A with the first row and jth column removed. By induction 


and so 


GIL] = [] (rt;)8/e"™ =I tare 


sd 
Lemma 2 II ee >t, 


Proof. Taking logarithms, this is equivalent to 


1 T 
~ >t Int; > tint, 


j=1 


which follows from the convexity of the function f(z) = xlnz. | 


Applying the lemma, 


G[L| >r II ale > r(t*)!/* = rt = per(A). 


j=l 


Now we calculate G[L] conditional on a fixed o. For convenience of notation 
reorder so that oz = #, all 2, and assume that the first row has ones in precisely the 
first r,; columns. With 7 selected uniformly the columns 1,...,7 are deleted in 
order uniform over all r;! possibilities. Rj is the number of those columns remaining 
when the first column is to be deleted. As the first column is equally likely to be in 
any position among those r; columns A; is uniformly distributed from 1 to ry and 
G[R,] = (r!)!/". “Linearity” then gives 


n 


Le. 


i=1 


GlL)=G 
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The overall G[L] is the geometric mean of the conditional G[L] and hence has the 


same value. That is, 
nm 


per(A) < G[Z] = It (rgt)/", 
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Beauty is the first test: there is no permanent place in the world for ugly 
mathematics. 


— G. H. Hardy 


The basic probabilistic method was described in Chapter | as follows: Trying to 
prove that a structure with certain desired properties exists, one defines an appropriate 
probability space of structures and then shows that the desired properties hold in this 
space with positive probability. In this chapter we consider situations where the 
“random” structure does not have all the desired properties but may have a few 
“blemishes.” With a small alteration we remove the blemishes, giving the desired 
structure. 


3.1 RAMSEY NUMBERS 


Recall from Section 1.1 that R(k,/) > n means there exists a two-coloring of the 
edges of K,, by red and blue so that there is neither a red K;, nor a blue K;. 


Theorem 3.1.1 For any integer 'n, R(k,k) > n— (;) gi-(2) . 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
Copyright © 2008 John Wiley & Sons, Inc. 
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Proof. Consider a random two-coloring of the edges of K,, obtained by coloring 
each edge independently either red or blue, where each color is equally likely. For 
any set R of k vertices let Xz be the indicator random variable for the event that the 
induced subgraph of K,, on R is monochromatic. Set X = }*> Xp, the sum over all 
such R. By linearity of expectation, E |X] = )> E[XR] = m with m = (eyata ta). 
Thus there exists a two-coloring for which X < m. Fix such a coloring. Remove 
from K,, one vertex from each monochromatic k-set. At most m vertices have been 
removed (we may have “removed” the same vertex more than once but this only 
helps) so s vertices remain with s > n — m. This coloring on these s points has no 
monochromatic k-set. | 


We are left with the “calculus” problem of finding that n which will optimize the 
inequality. Some analysis shows that we should take n ~ e~!k2*/?(1 — o(1)) giving 


1 
R(k,k) > <(1+ o(1))k2*/? , 
A careful examination of Proposition 1.1.1 gives the lower bound 


R(k, k) > + 0(1))k2*/? , 


1 
—=(1 
eV/2 ( 
The more powerful Lovasz Local Lemma (see Chapter 5) gives 


R(k,k) > aL + o(1))k2*/? 


The distinctions between these bounds may be considered inconsequential since the 
best known upper bound for R(k, k) is (4+ .0(1))*. The upper bounds do not involve 
probabilistic methods and may be found, for example, in Graham, Rothschild and 
Spencer (1990). We give all three lower bounds in following our philosophy of 
emphasizing methodologies rather than results. 

In dealing with the off-diagonal Ramsey numbers the distinction between the basic 
method and the alteration is given in the following two results. 


Theorem 3.1.2 [f there exists p € [0,1] with 
(j,)o + (7) a-p@<1 


Theorem 3.1.3 For all integers n and p € {0, 1], 
Rk) >= (1) - (Tan. 


Proof. In both cases we consider a random two-coloring of K,,, obtained by coloring 
each edge independently either red or blue, where each edge is red with probability 


then R(k, 1) > n. 
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p. Let X be the number of red k-sets plus the number of blue /-sets. Linearity of 


expectation gives 
E[X] = (Zo f. (") (1 — py). 


For Theorem 3.1.2, E[X] < 1 so there exists a two-coloring with X = 0. For 
Theorem 3.1.3 there exists a two-coloring with s “bad” sets (either red k-sets or blue 
I-sets), s < E[X]. Removing one point from each bad set gives a coloring of at least 
n — 8 points with no bad sets. a 


The asymptotics of Theorems 3.1.2 and 3.1.3 can get fairly complex. Oftentimes 
Theorem 3.1.3 gives a substantial improvement on Theorem 3.1.2. Even further 
improvements may be found using the Lovdsz Local Lemma. These bounds have 
been analyzed in Spencer (1977). 


3.2 INDEPENDENT SETS 


Here is a short and sweet argument that gives roughly half of the celebrated Turdn’s 
Theorem. a(G) is the independence number of a graph G; a(G) > t means there 
exist ¢ vertices with no edges between them. 


Theorem 3.2.1 Let G = (V,E) have n vertices and nd/2 edges, d > 1. Then 
a(G) > n/2d. 


Proof. Let S C V be a random subset defined by 
Pr[v € S]=p, 


p to be determined, the events v € S being mutually independent. Let X = {S| and 
let Y be the number of edges in G|5. For each e = {7,7} € E let Y, be the indicator 
random variable for the event 7,7 € S so that Y = }0.., Ye. For any such e, 


E[¥.] = Pr[i,j € S] =p’, 
so by linearity of expectation, 
nd 
E[Y] = VE] =—>?". 
ecE 


Clearly E [|X] = np, so, again by linearity of expectation, 


nd 
E[X —Y]=np- >’. 


We set p = 1/d (here using d > 1) to maximize this quantity, giving 


nm 
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Thus there exists a specific S for which the number of vertices of S minus the number 
of edges in S is at least n/2d. Select one vertex from each edge of S and delete it. 
This leaves a set S* with at least n/2d vertices. All edges having been destroyed, S* 
is an independent set. a 


The full result of Turan is given in The Probabilistic Lens: Turan’s Theorem 
(following Chapter 6). 


3.3 COMBINATORIAL GEOMETRY 


For a set S of n points in the unit square U, let T(S) be the minimum area of a 
triangle whose vertices are three distinct points of S. Put T(n) = max 7(S), where 
S ranges over all sets of n points in U. Heilbronn conjectured that T(n) = O(1/n?). 
This conjecture was disproved by Komlés, Pintz and Szemerédi (1982) who showed, 
by a rather involved probabilistic construction, that there is a set S of n points in U 
such that 7(S') = Q(logn/n”). As this argument is rather complicated, we only 
present here a simpler one showing that T(n) = Q(1/n?). 


Theorem 3.3.1 There is a set S of n points in the unit square U such that T(S) > 
1/(100n?). 


Proof. We first make a calculation. Let P,Q, R be independently and uniformly 
selected from U and let up = pp(PQR) denote the area of the triangle PQR. We 
bound Pr [ji < €] as follows. Let x be the distance from P to Q so that 


Prlb <a <b+ Ad] < x(b + Ab)? — rb? 


and in the limit Pr [b < 2 < b + db) < 2b db. Given P, Q at distance b, the altitude 
from R to the line PQ must have height h < 2e/b and so R must lie in a strip of 
width 4¢/b and length at most 2. This occurs with probability at most 4,\/2e/b. As 
0<b< V2 the total probability is bounded by 


V2 
[ (2nb)(4V2€/b) db = 167. 
0 


Now let P;,..., Pon be selected uniformly and independently in U and let X 
denote the number of triangles P;P;P, with area less than 1/(100n”). For each 
particular i, 7, k the probability of this occurring is less than 0.6n~? and so 


E[X] < & (0.6n-2) <n. 


Thus there exists a specific set of 2n vertices with fewer than 7 triangles of area less 
than 1/(100n). Delete one vertex from the set from each such triangle. This leaves 
at least n vertices and now no triangle has area less than 1/(100n7). a 
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We note the following construction of Erdés showing T'(n) > 1/(2(n — 1)*) with 
n prime. On [0,7 — 1] x [0,n — 1] consider the n points (x, x”), where x? is reduced 
modn (more formally, (x,y) where y = 2? mod n and 0 < y < n). If some three 
points of this set were collinear they would line on a line y = mx +b and m would be 
a rational number with denominator less than n. But then in Z? the parabola y = x? 
would intersect the line y = ma + b at three points, so that the quadratic x? — ma —b 
would have three distinct roots, an impossibility. Triangles between lattice points in 
the plane have as their areas either half-integers or integers, hence the areas must be 
at least 1/2. Contracting the plane by an n — 1 factor in both coordinates gives the 
desired set. While this gem does better than Theorem 3.3.1 it does not lead to the 
improvements of Koml6s, Pintz and Szemerédi. 


3.4 PACKING 


Let C' be a bounded measurable subset of R¢ and let B(x) denote the cube (0, x]? of 
side x. A packing of C' into B(x) is a family of mutually disjoint copies of C, all 
lying inside B(x). Let f(x) denote the largest size of such a family. The packing 
constant 6 = 6(C) is defined by 


8(C) = n(C) lim f(x)a", 


where 4(C) is the measure of C’. This is the maximal proportion of space that may 
be packed by copies of C' (this limit can be proved always to exist but even without 
that result the following result holds with lim replaced by lim inf.) 


Theorem 3.4.1 Let C be bounded, convex and centrally symmetric around the origin. 
Then 6(C) > 2-27}, 


Proof. Let P, Q be selected independently and uniformly from B(a) and consider the 
event (C + P)M(C + Q) # @. For this to occur we must have, for some c1, cz € C, 


P-Q=4-q@=25>" 20 


by central symmetry and convexity. The event P € Q + 2C has probability at most 
p(2C)x~¢ for each given Q, hence 


Pr((C+ P)N(C+Q) 4G < p(2C)z~? = 242-4 (CC). 


Now let P,,..., P, be selected independently and uniformly from B(«) and let_X 
be the number of i < j with (C+ P;) (C+ P;) 4 0. From linearity of expectation, 


E[X < Mod ~4i(C 


Hence there exists a specific choice of n points with fewer than that many intersecting 
copies of C’. For each P;, P; with (C + P;) (C+ P;) 4 9 remove either P; or 
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P; from the set. This leaves at least n — (n?/2)2¢x—¢y(C)) nonintersecting copies 
of C. Set n = r42~4/p(C) to maximize this quantity, so that there are at least 
z42-4-1 /14(C) nonintersecting copies of C. These do not all lie inside B(x) but, 
letting w denote an upper bound on the absolute values of the coordinates of the 
points of C’, they do all lie inside a cube of side x + 2w. Hence 


f(x + 2w) > 242-*" /u(C) 


and so 6(C) > lim p(C) f(z + 2w)(z +2w)-4 >2-e 1.” = 
r—-0O 
A simple greedy algorithm does somewhat better. Let P;,..., Pm be any maximal 


subset of [0, a]? with the property that the sets C + P; are disjoint. We have seen 
that C + P; overlaps C' + P if and only if P € 2C + P;. Hence the sets 2C' + P; 
must cover {0, 2]?. As each such set has measure ju(2C) = 27(C) we must have 
m > x42~-4/p(C). As before, all sets C + P; lie in a cube of side x + 2w, wa 
constant, so that 


f(a+2w) >m > 22274/u(C) 


and so 


d(C) > 27%, 


A still further improvement appears in The Probabilistic Lens: Efficient Packing 
(following Chapter 14). 


3.5 RECOLORING 


Suppose that a random coloring leaves a set of blemishes. Here we apply a random 
recoloring to the blemishes to remove them. If the recoloring is too weak then not 
all the blemishes are removed. If the recoloring is too strong then new blemishes 
are created. The recoloring is given a parameter p and these two possibilities are 
decreasing and increasing functions of p. Calculus then points us to the optimal p. 

We use the notation of Section 1.3 on property B: m(n) > m means that given 
any n-uniform hypergraph H = (V, £) with m edges there exists a two-coloring of 
V so that no edge is monochromatic. Beck (1978) improved Erdos’ 1963 bound to 
m(n) = 2(2"n'/3). Building on his methods, Radhakrishnan and Srinivasan (2000) 
proved m(n) = 2(2"(n/Inn)!/*) and it is that proof we shall give. While this 
proof is neither long nor technically complex it has a number of subtle and beautiful 
steps and it is not surprising that it took more than thirty-five years to find it. That 
said, the upper and lower bounds on m(n) remain quite far apart! 


Theorem 3.5.1 If there exists p © [0,1] with k(1 — p)” + k?p < 1 then m(n) > 
Qn Tk. 


Corollary 3.5.2, m(n) = 0 (2"(n/Inn)}/2). 
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Proof. Bound 1 — p < e~?. The function ke~?" + k*p is minimized at p = 
In(n/k)/n. Substituting back in, if 


2 

os (1+ In(n/k)) <1 

then the condition of Theorem 3.5.1 holds. This inequality is true when k = 
c(n/Inn)!/? for any c < V2 with n sufficiently large. a 


The condition of Theorem 3.5.1 is somewhat typical; one wants the total failure 
probability to be less than 1 and there are two types of failure. Oftentimes one 
finds reasonable bounds by requiring the stronger condition that each failure type 
has probability Jess than one-half. Here k?p < $ gives p < 5k. Plugging the 
maximal possible p into the second inequality k(1 — p)” < 4 gives 2k? In(2k) <n. 
This again holds when k = c(n/Inn)1/? though now we have the weaker condition 
c < 1. We recommend this rougher approach as a first attempt at a problem, when 
the approximate range of the parameters is still in doubt. The refinements of calculus 
can be placed in the published work! 


Proof [Theorem 3.5.1]. Fix H = (V,E) with m = 2”~'k edges and p satisfying 
the condition. We describe a randomized algorithm that yields a coloring of V. It 
is best to preprocess the randomness: Each v € V flips a first coin, which comes 
up heads with probability 5 and a second coin, which comes up heads (representing 
potential recoloration) with probability p. In addition (and importantly), the vertices 
of V are ordered randomly. 

Step 1. Color each v € V red if its first coin was heads, otherwise blue. Call this 
the first coloring. Let D (for dangerous) denote the set of v € V that lie in some 
(possibly many) monochromatic e € E. 

Step 2. Consider the elements of D sequentially in the (random) order of V. 
When d is being considered call it still dangerous if there is some (possibly many) 
e © H containing d that was monochromatic in the first coloring and for which no 
vertices have yet changed color. If d is not still dangerous then do nothing. But if 
it is still dangerous then check its second coin. If it is heads then change the color 
of d, otherwise do nothing. We call the coloring at the time of termination the final 
coloring. 

We say the algorithm fails if some e € H is monochromatic in the final coloring. 
We shall bound the failure probability by k(1 — p)” + k2p. The assumption of 
Theorem 3.5.1 then assures us that with positive probability the algorithm succeeds. 
This, by our usual magic, means that there is some running of the algorithm which 
yields a final coloring with no monochromatic e; that is, there exists a two-coloring 
of V with no monochromatic edge. For convenience, we bound the probability that 
some e € H is red in the final coloring; the failure probability for the algorithm is at 
most twice that. 

Ane € £can be red in the final coloring in two ways. Either e was red in the first 
coloring and remained red through to the final coloring or e was not red in the first 
coloring but was red in the final coloring (the structure of the algorithm assures us 
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that vertices cannot change color more than once). Let A, be the first event and C, 
the second. Then 


Pr[A.] =2-"(1—p)”. 


The first factor is the probability e is red in the first coloring, that all first coins of e 
came up heads. The second factor is the probability that all second coins came up 
tails. If they all did, then no v € e would be recolored in Step 2. Inversely, if any 
second coins of v € e came up heads there would be a first v (in the ordering) that 
came up heads. When it did v was still dangerous as e was still monochromatic and 
so v does look at its second coin and change its color. We have 


25> Pr[Ae] = &(1 -p)” 


eCH 


giving the first addend of our failure probability. 

In Beck’s 1978 proof, given in our first edition, there was no notion of “still 
dangerous” — every d € D changed its color if and only if its second coin was 
heads. The values Pr [A,| = 2~”"(1 — p)” are the same in both arguments. Beck’s 
had bounded Pr [C.] < k?pe?”. The new argument avoids excessive recoloration 
and leads to a better bound on Pr [C,]. We turn to the ingenious bounding of Pr [C.]. 

For distinct e, f € E we say e blames f if: 


e e, f overlap in precisely one element. Call it v. 

e In the first coloring f was blue and in the final coloring e was red. 

e In Step 2 v was the last vertex of e that changed color from blue to red. 
e When v changed its color f was still entirely blue. 


Suppose C’, holds. Some points of e changed color from blue to red so there is a 
last point v that did so. But why did v flip its coin? It must have been still dangerous. 
That is, v must be in some (perhaps many) set f that was blue in the first coloring 
and was still blue when v was considered. Can e, f overlap in another vertex v’ ? 
No! For such a v’ would necessarily have been blue in the first coloring (as v’ € f) 
and red in the final coloring (as v’ € e), but then v’ changed color before v. Hence f 
was no longer entirely blue when v was considered, contradicting the assumption on 
f. Therefore, when C, holds, e blames some f. Let B.¢ be the event that e blames 
f. Then 5°, Pr [Ce] < 30.4, Pr[Bey]. As there are less than (2"~*k)? pairs e # f 
it now suffices to bound Pr [B.s] < 2'~?"p. 

Let e, f withe N f = {v} (otherwise B.¢ cannot occur) be fixed. The random 
ordering of V induces a random ordering o of eU f. Let i = i(c) denote the number 
of v’ € e coming before v in the ordering and let 7 = j() denote the number of 
vu’ € f coming before v in the ordering. Fixing o we claim 


Pr [Bey | a] < a2 _ pyigrntiti (+) 
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Let’s take the factors one at atime. First, v itself must start blue and turn red. Second, 
all other v’ € f must start blue. Third, all v’ € f coming before v must have second 
coin tails. Fourth, all v’ € e coming after v must start red (since v is the last point of 
e to change color), Finally, all v’ € e coming before v must either start red or start 
blue and turn red. [The final factor may well be a substantial overestimate. Those 
v’ € e coming before v which start blue must not only have second coin heads but 
must themselves lie in an e’ € H monochromatic under the first coloring. Attempts 
to further improve bounds on m(n) have often centered on this overestimate but (thus 
far!) to no avail.} 
We can then write 


Pr [Bes] < 2172"pE [(1 + p)'(1 — p)] , 


where the expectation is over the uniform choice of 7. The following gem therefore 
completes the argument. 


Lemma 3.5.3 E[(1+p)'(1—p)?] <1. 


Proof. Fix a matching between e — {v} and f — {v}; think of Mr. & Mrs. Jones, 
Mr. & Mrs. Smith, and so on. Condition on how many of each pair (two Joneses, 
one Smith, no Taylors, etc.) come before v. This splits the space into 3°-! parts, 
and it suffices to show that the conditional expectation in each of them is at most 
1. Indeed, the factor contributed to (1 + p)*(1 — p)? from each pair is at most 1, as 
follows: when there is no Taylor there is no factor. When there are two Joneses there 
is a factor (1+p)(1—p) < 1. When there is one Smith the factor is equally likely to 
be 1 + p (Brad) or 1 — p (Angelina), giving a factor of one. Moreover, these factors 
are independent for different pairs (given the above conditioning). All factors are at 
most one, and hence so is their product. |_| 


The desired result follows. | 


3.6 CONTINUOUS TIME 


Discrete random processes can sometimes be analyzed by placing them in a con- 
tinuous time framework. This allows the powerful methods of analysis (such as 
integration!) to be applied. The approach seems most effective when dealing with 
random orderings. We give two examples. 


Property B. We modify the proof that m(n) = 0(2"n!/2 In~!/? n) of the previous 
section. We assign to each vertex v € V a “birth time” z,. The x, are independent 
real variables, each uniform in [0, 1]. The ordering of V is then the ordering (under 
less than) of the x,. We now claim 


n—-l 


-1 1 
Pr [Bes] < y. C i ee, zip't!(1 — ap)\"—! de. 


i=0 
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For T C e — {v} let Beyr be the event that B.- and in the first coloring e had 
precisely TU {v} blue. There are (SS) choices for an /-set 7, with / ranging from 
0 to n — 1. The first coloring on e U f is then determined and has probability 21~?” 
of occurring. Suppose v has birth time x, = x. All w € TU {v} must have second 
coin flip heads — probability p’+!. All w € T must be born before v — so that 
Zw < x, which has probability x’. No w € f — {v} can be born before v and have 
coin flip heads. Each such w has probability xp of doing that so there is probability 
(1 — xp)"~* that no w does. As x, = x was uniform in (0, 1] we integrate over «x. 
Recombining terms, 


1 
Pr (Bey| < gy (1+ 2p)"—1(1 — ap)""! dz. 
0 


The integrand is always at most one so Pr [Bes] < 21~?"p. The remainder of the 
proof is unchanged. 


Random Greedy Packing. Let H be a (k + 1)-uniform hypergraph on a vertex 
set V of size N. The e € H, which we call edges, are simply subsets of V of size 
k +1. We assume: 

Degree Condition: Every v € V is in precisely D edges. 

Codegree Condition: Every distinct pair v, v’ € V have only o( D) edges in common. 

We think of & fixed (& = 2 being an illustrative example) and the asymptotics as 
N, D — ov, with no set relationship between N and D. 

A packing is a family P of vertex disjoint edges e € H. Clearly |P| < N/(k+1). 
We define a randomized algorithm to produce a (not necessarily optimal) packing. 
Assign to each e € H uniformly and independently a birth time x, € [0,D). [The 
choice of [0, D) rather than [0, 1] proves to be a technical convenience. Note that as 
the x, are real variables with probability one there are no ties.] At time zero P — 9. 
As time progresses from 0 to D when an edge € is born it is added to P if possible — 
that is, unless there is already some e’ € P that overlaps e. Let P, denote the value of 
P just before time c — when all e with birth times t, < c have been examined. Set 
PFINAL — Pp. Note that by time D all edges have been born and their births were 
in random order. Thus PF!NAY js identical to the discrete process — often called 
the random greedy algorithm — in which 4 is first randomly ordered and then the 
e € H are considered sequentially. 


Theorem 3.6.1 [Spencer (1995)] The expected value of |P¥!NA¥| is asymptotic to 
N/(k + 1). 


We say v € V survives at time c if no e € P, contains v and we let S, denote 
the set of v € V so surviving. Rather than looking at PF'NAY we shall examine P., 
where c is an arbitrary fixed nonnegative real. Let 


f(c) = limPr[v € S.J, 


where, formally, we mean here that for all € > 0 there exist Dp, No and 6 > 0 so that 
if H is (k+1)-uniform on N > No vertices with each v in D > Do edges and every 
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distinct pair v, v’ € V has less than 6D, common edges then | f(c) —Pr |v € S.|| < € 
forallu € V. 

The heart of the argument lies in showing that f(c) exists by defining a continuous 
time birth process yielding that value. We now describe the birth process, omitting 
some of the epsilondeltamanship needed to formally show the limit. 

Our birth process starts at time c and time goes backwards to 0. It begins with 
root Eve, our anthropomorphized v. Eve has births in time interval [0,c). The 
number of births is given by a Poisson distribution with mean c and given their 
number their times are uniformly and independently distributed. [This is a standard 
Poisson process with intensity one. Equivalently, on any infinitesimal time interval 
[x, x + dx), Eve has probability dx of giving birth and these events are independent 
over disjoint intervals.] Our fertile Eve always gives birth to k-tuplets. Each child is 
born fertile under the same rules, so if Alice in born at time x she (in our unisexual 
model) has a Poisson distribution with mean «x of births, uniformly distributed in 
(0, x). 

The resulting random tree 7’ = T, can be shown to be finite (note the time interval 
is finite) with probability 1. Given a finite T’ we say for each vertex Alice that Alice 
survives or dies according to the following scheme. 

Menendez Rule: If Alice has given birth to a set (or possibly several sets) of 
k-tuplets all of whom survived then she dies; otherwise she survives. 

In particular, if Alice is childless she survives. We can then work our way up the 
tree to determine of each vertex whether she survives or dies. 


Example. c = 10,k = 2. Eve gives birth to Alice, Barbara at time 8.3 and then to 
Rachel, Siena at time 4.3. Alice gives birth to Nancy, Olive at time 5.7 and Rachel 
gives birth to Linda, Mayavati at time 0.4. There are no other births. Leaves Nancy, 
Olive, Linda, Mayavati, Barbara and Siena then survive. Working up the tree Alice 
and Rachel die. In neither of Eve’s births did both children survive and therefore Eve 
survives. 

We define f(c) to be the probability that the root Eve survives in the random birth 
tree T = T,. 

We outline the equivalence by defining a tree T = T,,(v) for v € H. For each edge 
e containing v with birth time t = t. < c we say that e — {v} is a set of k-tuplets born 
to v at time ¢. We work recursively; if w is born at time ¢ then for each e’ containing 
w with birth time t’ = te, < t we say that e’ — {w} is a set of k-tuplets born to w 
at time t’. Possibly this process does not give a tree since the same vertex w may be 
reached in more than one way — the simplest example is if v € e, e’ where both have 
birth times less than c and e, e’ share another common vertex w. Then the process is 
stillborn and T,,(v) is not defined. We’ll argue that for any particular tree T, 


lim Pr [T.(v) & T] = Pr(T, =T}. (3.1) 


As }>, Pr |Z, = T] = 1 this gives a rather roundabout argument that the process 
defining T’.(v) is almost never stillborn. 

We find T.(v) in stages. First consider the D edges e containing v. The number of 
them with birth time t. < chas binomial distribution BIN[D, c/ D] which approaches 
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(critically) the Poisson distribution with mean c. Given that there are / such e their 
birth times ¢, are uniformly distributed. There are (by the codegree condition) o( D?) 
pairs e, e’ containing v and also some other vertex so there is probability o(1) that 
two such e,e’ have birth time less than c. Now suppose T,.(v) has been built out 
to a certain level and a vertex w has been born at time t. There are only o(D) 
common edges between w and any of the finite number of w’ already born, so there 
are still about D edges e containing w and no other such w’. We now examine their 
birth times, the number with t, < z has binomial distribution BIN[D — o(D), 2/D} 
which approaches the Poisson distribution with mean x. As above, almost surely no 
two such e, e’ will have a common vertex other than w itself. For any fixed 7 the 
calculation of Pr [T.(v) © T] involves a finite number of these limits, which allows 
us to conclude (3.1). 

With c < d the random tree Ty includes T, as a subtree by considering only those 
births of Eve occurring in [0,c). If Eve survives in Tg she must survive in T... Hence 
f(d) < f(c). We now claim 

lim f(c) =0. 
coo 


If not, the nondecreasing f would have a limit L > 0 and all f(z) > L. Suppose 
in T, Eve had i births. In each birth there would be probability at least L* that all k 
children survived. The probability that Eve survived would then be at most (1 — L*)’. 
Since the number of Eve’s births is Poisson with mean c, 


oo i 
~cl t —L*e 
f(o) < Pre" F(a - Lh =e 
i=0 : 


but then lim._,.. f(c) = 0, a contradiction. 

By linearity of expectation E[|S.|| > f(c)n. As (k + 1)|P.| + |S] = n, 
E[|Pel] — (1 — f(c))n/(k +1). But E [|PFINA|] > E[|P.|]. We make f(c) arbi- 
trarily small by taking c appropriately big, so that E [|P™!NA"|] > (1—o0(1))n/(k+ 
1). As |PFINAL| < n/(k + 1) always, the theorem follows. 


Remark. We can actually say more about f(c). For Ac small, f(c + Ac) — f(c) ~ 
—(Ac)f(c)**! as, roughly, an Eve starting at time c + Ac might have a birth in 
time interval [c,c + Ac), all of whose children survive, while Eve has no births 
in [0,c), all of whose children survive. Letting Ac — 0 yields the differential 
equation f’(c) = —f(c)*+!. The initial value f(0) = 1 gives a unique solution 
f(c) = (1+ ck)-'/*. It is intriguing to plug in c = D. This is not justified as 
our limit arguments were for c fixed and N, D — oo. Nonetheless, that would yield 
E[|Sp|}] = O(N D-"/*), that the random greedy algorithm would leave O(N D~!/*) 
vertices uncovered. Suppose we replace the codegree condition by the stronger 
condition that every distinct pair v,v’ € V have at most one edge in common. There 
is computer simulation data that in those cases the random greedy algorithm does 
leave O(N D~'/*) vertices uncovered. This remains an open question, though it is 
shown in Alon, Kim and Spencer (1997) that this is the case for a modified version 
of the greedy algorithm. 


EXERCISES 39 


Corollary 3.6.2 Under the assumptions of the theorem there exists a packing P of 
size ~~ N/(k +1). 


Proof. We have defined a random process that gives a packing with expected size 
~ N/(k +1) and our usual magic implies such a P must exist. a 


In particular, this gives an alternate proof to the Erdos—Hanani conjecture, first 
proved by Rédl as given in Section 4.7. We use the notation of that section and 
define the packing number m(n, k, /) as the maximal size of a family F of k-element 
subsets of [mn] = {1,...,2} such that no /-set is contained in more than one k-set. 
Define a hypergraph H = H(n,k, 1) as follows: The vertices of H are the /-element 
subsets of [n]. For each k-element A C [n] we define an edge e, as the set of 
l-element subsets of A. A family F' satisfying the above conditions then corresponds 
to a packing P = {e4 : A € F} in H. H has N = (") vertices. Each edge e4 


has size K + 1 = (‘). Each vertex is in D = (77/) edges. The number of edges 


containing two vertices v, v’ depends on their intersection. It is largest (given v # v’) 


when v, v’ (considered as [-sets) overlap in 1 — 1 points and then it is Gao): We 


assume (as in Section 4.7) that k, / are fixed and n — oo so this number of common 
edges is o(D). The assumptions of Section 4.7 give K + 1 fixed, N, D — oo so that 
there exists P with 


m(n,k,l) =|P|~ N/(K +1) ~ & / (7) ; 


3.7 EXERCISES 


1. As shown in Section 3.1, the Ramsey number R(k, k) satisfies 
R(k,k) >n- (;) q!-(2) 
for every integer n. Conclude that 
K ok/2 
R(k,k) > (1 — o(1))—2°"". 
€ 


2. Prove that the Ramsey number R(4, k) satisfies 
R(4,k) > Q((k/ In k)?). 


3. Prove that every three-uniform hypergraph with n vertices and m > n/3 edges 
contains an independent set (i.e., a set of vertices containing no edges) of size 
at least 

2n3/2 


3V3/m 
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4. (*) Show that there is a finite no such that any directed graph on n > no 
vertices in which each outdegree is at least logy n — a log, log, n contains an 
even simple directed cycle. 


THE PROBABILISTIC LENS: 


High Girth and 
High Chromatic Number 


Many consider this one of the most pleasing uses of the probabilistic method, as the 
result is surprising and does not appear to call for nonconstructive techniques. The 
girth of a graph G is the size of its shortest cycle, a(G) is the size of the largest 
independent set in G' and y(G) denotes its chromatic number. 


Theorem 1 [Erdos (1959)] For all k,l there exists a graph G with girth(G) > l 
and x(G) > k. 


Proof. Fix 6 < 1/l and let G ~ G(n,p) with p = n°~!; that is, G is a random 
graph on n vertices chosen by picking each pair of vertices as an edge randomly and 
independently with probability p. Let X be the number of cycles of size at most I. 


Then 
0% 


i l 
_ yn (: eS 
=> aes ‘S ae 


i=3 i=3 


as #1 < 1. In particular, 
Pr{X > n/2)= 01), 


Set x = [(3/p) Inn] so that 


Pr [a(G) > a] < ("a — p)@@) < [new vt=—1)/2]" = o(1). 


Let n be sufficiently large so that both these events have probability less than 0.5. 
Then there is a specific G with less than n/2 cycles of length at most J and with 
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a(G) < 3n!~® Inn. Remove from G' a vertex from each cycle of length at most l. 
This gives a graph G* with at least n/2 vertices. G* has girth greater than / and 
a(G*) < a(G). Thus 


1 IG. nf _ 
> = : 
x(@") 2 a(G*) ~ 3n!-8lnn =6inn 


To complete the proof, let n be sufficiently large so that this is greater than k. a 


The Second Moment 


You don’t have to believe in God but you should believe in The Book. 
— Paul Erdds 


4.1 BASICS 


After the expectation the most vital statistic for a random variable X is the variance. 
We denote it Var [X]. It is defined by 
Var [X] = E [(X — B[X])?] 


and measures how spread out X is from its expectation. We shall generally, following 
standard practice, let jz denote expectation and o” denote variance. The positive 
square root o of the variance is called the standard deviation. With this notation, 
here is our basic tool. 


Theorem 4.1.1 [Chebyshev’s Inequality] For any positive i, 


1 
Pr [|X — p| 2 Ao] S 55 - 


Proof. o” = Var |X] = E[(X — p)?] > A°o?Pr [|X — p| > do} . a 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
Copyright ©) 2008 John Wiley & Sons, Inc. 
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The use of Chebyshev’s Inequality is called the second moment method. 

Chebyshev’s Inequality is most possible when no additional restrictions are placed 
on X as X may be fs + Ao and pz ~— Ao with probability 1/2A? and otherwise ju. Note, 
however, that when X is a normal distribution with mean yu and standard deviation o 


then 


1 
———e ?/2 dt 
20 


Pr [|X — p| > Ao] 20), 
» 
and for A large this quantity is asymptotically /2/ me /2 /A, which is significantly 
smaller than 1/\*. In Chapters 7 and 8 we shall see examples where X is the sum of 
“nearly independent” random variables and these better bounds can apply. 
Suppose we have a decomposition 


NS he Xe 


Then Var [|X| may be computed by the formula 


Var [X] = $2 Var [Xi] + 5 © Cov [Xi, Xj] . 
i=1 iFj 


Here the second sum is over ordered pairs and the covariance Cov [Y, Z] is defined 
by 
Cov [Y, Z] = E[Y Z] —E[Y]E[Z] . 


In general, if Y,Z are independent then Cov [Y,Z] = 0. This often simplifies 
variance calculations considerably. Now suppose further, as will generally be the 
case in our applications, that the X; are indicator random variables; that is, X; = 1 
if a certain event A; holds and otherwise X; = 0. If X; is one with probability 
p; = Pr [Aj] then 


Var [X;] = pi(1 — pi) < pi = E[Xi] , 


and so 
Var [X] < E[X] + 5° Cov [Xi, Xj] . 
i#j 


4.2 NUMBER THEORY 


The second moment method is an effective tool in number theory. Let v(m) denote the 
number of primes p dividing n. (We do not count multiplicity though it would make 
little difference.) The following result says, roughly, that “almost all” n have “very 
close to” Inlnn prime factors. This was first shown by Hardy and Ramanujan in 
1920 by a quite complicated argument. We give a remarkably simple proof of Turan 
(1934), a proof that played a key role in the development of probabilistic methods in 
number theory. 
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Theorem 4.2.1 Let w(n) — co arbitrarily slowly. Then the number of x in 
{1,...,} such that 
|u(z) — InInn| > w(n)VIninn 


is o(n). 


Proof. Let x be randomly chosen from {1,...,7}. For p prime set 


2 fh. af ple; 
Xp = { 0 otherwise. 


Set M = n}/!° and set X = D> X,, the summation over all primes p < M. As no 
x < ncan have more than ten prime factors larger than M we have v(x) — 10 < 
X (x) < v(x) so that large deviation bounds on X will translate into asymptotically 
similar bounds for v. [Here 10 could be any (large) constant.] Now 


B[x,] = el. 


n 


Asy—l<lyl <y, 
E[X>p] = 1/p+ O(1/n). 


By linearity of expectation, 
1 1 
E[X] = s E +O (<)) =InInn+O(1), 
v) n 
pSM 


where here we used the well-known fact that }7,<,(1/p) = InInz + O(1), which 
can be proved by combining Stirling’s formula with Abel summation. 
Now we find an asymptotic expression for 


Var [X] = $© Var [Xp] + 5° Cov [Xp, Xo] - 
pSM PFq 


As Var [X,] = (1/p)(1 — 1/p) + O(1/n), 


S— Var [Xp] = ys + O(1)=InInn+O(1). 


PSM PSM 


With p, q distinct primes, X,X, = 1 if and only if p|x and g|x, which occurs if and 
only if pg|a. Hence 


Cov[Xp,Xq] = E[XpXq] — E[Xp] E[Xq] 
_ [n/pq) _ [n/p] |n/a] 


IA 


arate) Gta) 


IA 
Sle 
fo oN. 
Sle 

4 
QlR 
nN 
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Thus 


S: Car 4) = “> (5+ 1) Z aS 1 
P#q p#q 
Thus 
yc Cov [Xp, Xq] < O(n 9/19 n Inn) = o(1), 
PFq 
and similarly 
bs Cov [X,, Xq] > —o(1). 
P#q 


That is, the covariances do not affect the variance, Var |X] = Inlnn + O(1) and 
Chebyshev’s Inequality actually gives 


Pr [Ix —InInn| > AVInIn n| <A? + o(1) 
for any constant \ > 0. As |X — v| < 10 the same holds for v. a 


In a classic paper Erdos and Kac (1940) showed, essentially, that » does behave 
like a normal distribution with mean and variance In In n. Here is their precise result. 


Theorem 4.2.2 Let X be fixed, positive, negative or zero. Then 


eo 


1 2 
lim — hee 1<a<n,u(x)> nnn + AVinInn}| iy ao [dt 


nO TL X Tv 


Proof. We outline the argument, emphasizing the similarities to Turan’s proof. 
Fix a function s(n) with s(n) — oo and s(n) = o ((InInn)!/?) — for example, 
s(n) = InInInn. Set M = n'/°™, Set X = S> X,, the summation over all primes 
p < M. As nox < ncan have more than s(n) prime factors greater than M we 
have v(x) — s(n) < X(a) < v(x) so that it suffices to show Theorem 4.2.2 with v 
replaced by X. Let Y, be independent random variables with Pr [Y, = 1] = 1/p, 
Pr [Y, = 0] = 1—1/p and set Y = $“Y,, the summation over all primes p < M. 
This Y represents an idealized version of X. Set 


1 
B=E|Y| = — =Ininn +0 ((Ininn)"/?) 
[Y] > ( ) 
pM 


and 


1 1 
o* = Var [Y] = oD — (1 - *) ~InInn 
Pp 


pM e 
and define the normalized Y = (Y — z)/o. From the Central Limit Theorem Y 
approaches the standard normal N and E ba —E [N ) for every positive integer 
k. Set X = (X — p)/o. We compare X,Y. 
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For any distinct primes p),...,ps <M, 


1| on 1 1 
E[Xp, ---Xp.] — E[Yp, --- ¥p.] = — ~ =O(-)}. 
Mp pol EDM rl ‘rears Pi-**Ds ) 


We let & be an arbitrary fixed positive integer and compare E [x | and E ba 


Expanding, X* isa polynomial in X with coefficients n°). Further expanding each 
X1 = (5) X;)) — always reducing X? to X, when a > 2 — gives the sum of 
O(M*) = n°) terms of the form X,, --- Xp,. The same expansion applies to Y. 
As the corresponding terms have expectations within O(1/n) the total difference 


E x] _E [F*| = n-1t0) = 9(1), 


Hence each moment of X approaches that of the standard normal N. A standard, 
though nontrivial, theorem in probability theory gives that X must therefore approach 
N in distribution. | 


We recall the famous quotation of G. H. Hardy: 
317 is a prime, not because we think so, or because our minds are shaped in 


one way rather than another, but because it is so, because mathematical reality 
is built that way. 


How ironic — though not contradictory — that the methods of probability theory 
can lead to a greater understanding of the prime factorization of integers. Additional 
results applying information about the moments of a distribution in order to determine 
it appear in Chapter 8; see also Billingsley (1995). 


4.3 MORE BASICS 


Let X be a nonnegative integral valued random variable and suppose we want to 
bound Pr [X = 0] given the value » = E [X]. If 4 < 1 we may use the inequality 


Pr [X > 0] < E[X] 


so that if E|X] — 0 then X = 0 almost always. (Here we are imagining an infinite 
sequence of X dependent on some parameter 7 going to infinity.) But now suppose 
E[X] — oo. It does not necessarily follow that X > 0 almost always. For example, 
let X be the number of deaths due to nuclear war in the twelve months after reading 
this paragraph. Calculation of E |X] can make for lively debate but few would deny 
that it is quite large. Yet we may believe — or hope — that Pr[X # 0] is very 
close to zero. We can sometimes deduce X > 0 almost always if we have further 
information about Var [X]. 


Var [X] 


Theorem 4.3.1 Pr [X = 0] < oe 
E[X] 
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Proof. Set \ = y/o in Chebyshev’s Inequality. Then 


1 o 
Pr[X = 0] 5 Prl|X —al 2 Ae) S y= 


We generally apply this result in asymptotic terms. 
Corollary 4.3.2 If Var |X| = o(E [X !’) then X > 0 almost always. 
The proof of Theorem 4.3.1 actually gives that, for any « > 0, 


Var [X| 


PriiX ~B[X}|> BIX]] < So 


and thus in asymptotic terms we actually have the following stronger assertion. 
Corollary 4.3.3 If Var [X] = o(E[X]’) then X ~ E[X] almost always. 


Suppose again X = X; +---+ Xm, where X; is the indicator random variable 
for event A;. For indices i,7 write 7 ~ 7 if 1 # j and the events A;, A; are not 
independent. We set (the sum is over ordered pairs) 


A= 5 >Pr[A; A Aj] . 


inj 
Note that when z ~ J, 
Cov [X;, Xj] = E[X:Xj] — E[Xi] ELXj] < E[XiXj] = Pr[4i A Ay] 
and that when i # j and not 7 ~ 7 then Cov [X;, X;] = 0. Thus 
Var [X] < E[X]+A. 


Corollary 4.3.4 If E[X] — co and A = o(E [X]}°) then X > 0 almost always. 
Furthermore X ~ E[X] almost always. 


Let us say X1,...,Xm are symmetric if for every 1 # j there is a measure 
preserving mapping of the underlying probability space that sends event A; to event 
A;. Examples will appear in the next section. In this instance we write 


A=) °Pr[AjA Aj] = 5 Pr[Ai] 5> Pr [A; | Ai] 


inj ini 


and note that the inner summation is independent of 7. We set 


A* = 5° Pr [A; | Ai] ; 


ji 
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where 2 is any fixed index. Then 


A= 5° Pr[Aj] A* = A* > Pr[A,] = A*E[X] . 


Corollary 4.3.5 If E[X] — oo and A* = o(E[X]) then X > 0 almost always. 
Furthermore X ~ E |X] almost always. 


The condition of Corollary 4.3.5 has the intuitive sense that conditioning on any 
specific A; holding does not substantially increase the expected number E [X] of 
events holding. 


4.4 RANDOM GRAPHS 


The random graph G(n, p) is, informally, the graph on n labeled vertices, obtained 
by selecting each pair of vertices to be an edge, randomly and independently, with 
probability p. A property of graphs is a family of graphs closed under isomorphism. 
A function r(m) is a threshold function for some property P, if whenever p = p(n) < 
r(n) then G(n, p) does not satisfy P almost always, and whenever p >> r(n) then 
G(n, p) satisfies P almost always. For more precise definitions of the random graph 
G(n, p) and of threshold functions, see Section 10.1. 

The results of this section are generally surpassed by those of Chapter 10 but 
they were historically the first results and provide a good illustration of the second 
moment. We begin with a particular example. By w(G) we denote here and in the 
rest of the book the number of vertices in the maximum clique of the graph G. 


Theorem 4.4.1 The property w(G) > 4 has threshold function n-?/°, 


Proof. For every 4-set S of vertices in G(n, p) let Ags be the event “S is a clique” 
and X¢ its indicator random variable. Then 


E[Xs| = Pr [As] = p® 


as six different edges must all lie in G(n, p). Set 


X= )° Xs 


|S|=4 


so that X is the number of 4-cliques in G and w(G) > 4 if and only if X > 0. 
Linearity of expectation gives 


BIX|= 3) Biel = (7 )o°~ SE. 
|S|=4 


When p(n) < n~?/3, E[X] = 0(1) and so X = 0 almost surely. 
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Now suppose p(n) > n~/% so that E [X] — oo and consider the A* of Corol- 
lary 4.3.5. (All 4-sets “look the same” so that the Xs are symmetric.) Here 
S ~ T if and only if S #4 T and S,T have common edges; that is, if and only 
if |SNT| = 2 or 3. Fix S. There are O(n”) sets T with |S 1 T'| = 2 and for each of 
these Pr [Ar | Ag| = p°. There are O(n) sets T with |S 9 T| = 3 and for each of 
these Pr [Ar | As| = p>. Thus 


A* = O(n"p?) + O(np*) = o(n*p®) = o(E[X]) 
since p > n~?/3, Corollary 4.3.5 therefore applies and X > 0; that is, there does 
exist a clique of size 4, almost always. a 
The proof of Theorem 4.4.1 appears to require a fortuitous calculation of A*. The 


following definitions pave the way for the more general Theorem 4.4.2. 


Definition 1 Let H be a graph with v vertices and e edges. We call p(H) = e/v the 
density of H. We call H balanced if every subgraph H’' has p(H') < p(H). We call 
H strictly balanced if every proper subgraph H’' has p(H') < p(H). 


Examples. 4 and, in general, K, are strictly balanced. The graph 


is not balanced as it has density 7/5 while the subgraph K, has density 3/2. The 
graph 


is balanced but not strictly balanced as it and its subgraph K’4 have density 3/2. 


Theorem 4.4.2 Let H be a balanced graph with v vertices and e edges. Let A(G) 
be the event that H is a subgraph (not necessarily induced) of G. Then p = n~”/€ 
is the threshold function for A. 


Proof. We follow the argument of Theorem 4.4.1. For each v-set S let Ag be the 
event that G|s contains H as a subgraph. Then 


pe < Pr{Ags] < vlp®. 
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(Any particular placement of H has probability p° of occurring and there are at most 
vu! possible placements. The precise calculation of Pr [As] is, in general, complicated 
due to the overlapping of potential copies of H.) Let X¥g be the indicator random 
variable for Ag and 

X= 5° Xs 


|S|=v 
so that A holds if and only if X > 0. Linearity of expectation gives 


n vive 
BIX]= > BIXs|= (")Pr[4s] = O(n’). 
|S|=v 
Ifp <n-’/¢ then E[X] = o(1), so X = 0 almost always. 

Now assume p >> n~°/© so that E[X] — oo and consider the A* of Corol- 
lary 4.3.5. (All u-sets look the same so the Xs are symmetric.) Here S ~ T if and 
only if S # T and S,T have common edges; that is, if and only if |S 1 T| = 7 with 
2<i<v-—1. Let S be fixed. We split 


At = 37 Pr[4r| 4s] => > Pr[Ar | As] . 


TAS i=2 |TNS|=i 


For each i there are O(n”~*) choices of T. Fix $,T and consider Pr [Ar | As]. 
There are O(1) possible copies of H on T. Each has — since, critically, H is 
balanced — at most ie/v edges with both vertices in S and thus at least e — (ie/v) 


other edges. Hence 
Pr {Ar | As] = O(p*-*/*)) 


and 


v-1 
Ke S| O(n? *peCe/™)) 
2 


v1 
= O((n’p®)!-#/”) 
i=2 


v1 


= S o(n*p*) 
1=2 


= o{E[X)) 


since n’p® — oo. Hence Corollary 4.3.5 applies. a 


Theorem 4.4.3 In the notation of Theorem 4.4.2 if H is not balanced then p = n~°/¢ 
is not the threshold function for A. 


Proof. Let H, be a subgraph of H with vj vertices, e; edges and e;/v; > e/v. Let 
@ satisfy v1 /e, < a@ < u/e and set p = n~%. The expected number of copies of Hy, 
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is then o(1) so almost always G(n, p) contains no copy of H;. But if it contains no 
copy of H; then it surely can contain no copy of H. a 


The threshold function for the property of containing a copy of H, for general H, 
was examined in the original papers of Erdds and Rényi (1960). It still provides an 
excellent introduction to the theory of random graphs. Let H, be that subgraph with 
maximal density p(H,) = e:/v1. (When H is balanced we may take H; = H.) 
They showed that p = n~”'/¢! is the threshold function. We do not show this here 
though it follows fairly straightforwardly from these methods. 

We finish this section with two strengthenings of Theorem 4.4.2. 


Theorem 4.4.4 Let H be strictly balanced with v vertices, e edges and a automor- 
phisms. Let X be the number of copies of H in G(n,p). Assume p > n-”/®, Then 
almost always 


x Pe np 
a 
Proof. Label the vertices of H by 1,...,v. For each ordered x1,...,%y let Az, 2, 
be the event that 71,..., 2, provides a copy of H in that order. Specifically we define 


Axy,..04 2 {5} € ECA) => {x:,2;} € E(G). 


We let Iz, ,...2, be the corresponding indicator random variable. We define an 
equivalence class on v-tuples by setting (71,...,2») = (y1,---,Yv) if there is an 
automorphism o of V(#) so that y,(;) = x; for 1 <i < v. Then 


> oe neers 


gives the number of copies of H in G where the sum is taken over one entry from 
each equivalence class. As there are (7),,/a terms, 
n n e n’ pe 
B(x] = eB fe,.weo] = 
a a a 


Our assumption p > n~°’/¢ implies E[X] — oo. It suffices therefore to show 
A* = o({£ [X]). Fixing 21,...,2y, 


A‘ = > Pr. Areiencarsy | Ay. as) : 


(Y1 5.0 Yo )~(21,-.-,2r) 


There are v!/a = O(1) terms with {y,..., yo} = {a1,...,2y} and for each the 
conditional probability is at most 1 (actually, at most p), thus contributing O(1) = 
o(E [X]) to A*. When {y1,..., yv} M{21,...,2y}hasielements, 2 <i < v—1 the 
argument of Theorem 4.4.2 gives that the contribution to A* is o(E [X]). Altogether 
A* = 0(E[X]) and we apply Corollary 4.3.5. a 
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Theorem 4.4.5 Let H be any fixed graph. For every subgraph H’ of H (including 
FH itself) let X denote the number of copies of H’ in G(n,p). Assume p is such 
that E |X 4] — 00 for every H'. Then 

XH ~ E[Xy] 


almost always. 


Proof. Let H have v vertices and e edges. As in Theorem 4.4.4 it suffices to 
show A* = o(E[X]). We split A* into a finite number of terms. For each H’ 


with w vertices and f edges we have those (y1,...,y,) that overlap with the fixed 
(x1,-...,;%y) inacopy of H’. These terms contribute, up to constants, 
if ee E [Xu] 
v—-w,e—-f _ = 
n-ne -f =O (ray = of E[Xy]}) 
to A*. Hence Corollary 4.3.5 does apply. a 


4.5 CLIQUE NUMBER 


Now we fix edge probability p = 3 and consider the clique number w(G). We set 


the expected number of k-cliques. The function f(&) drops under one at k ~ 2 logs n. 
[Very roughly, f(k) is like nkg-k/2] 


Theorem 4.5.1 Let k = k(n) satisfy k ~ 2log.n and f(k) — oo. Then almost 
always w(G) > k. 


Proof. For each k-set S let Ag be the event “S is a clique” and Xg the corresponding 
indicator random variable. We set 


X= 5° Xs 


|S|=k 


so that w(G) > k if and only if X > 0. Then E[X] = f(k) — co and we examine 
the A* of Corollary 4.3.5. Fix S and note that T ~ S if and only if |. S| = 7, 
where 2 <7 < k — 1. Hence 


aad (*) (2720-0 
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and so 


where we set 


Observe that g(i) may be thought of as the probability that a randomly chosen T 
will intersect a fixed S in i points times the factor increase in Pr |A7| when it does. 


Setting 2 = 2, 
()(io3) kA 


g(2) = 2 a oe < o(1/n). 
k 


At the other extreme 2 = k — 1, 
k(n —k)2~*-) — 2kna-* 
gk -1)= he. Af ee 
(2)2-G@) E[X] 
As k ~ 2logs n, the numerator is n~1+°(), The denominator approaches infinity 


and so g(k — 1) < o(1/n). Some detailed calculation (which we omit) gives that the 
remaining g(7) and their sum are also negligible so that Corollary 4.3.5 applies. @ 


Theorem 4.5.1 leads to a strong concentration result for w(G). For k ~ 2 log, n, 


f(k+1) _ n—k 
fk) k+1 


Let ko = ko(n) be that value with f(ko) > 1 > f(ko +1). For “most” n the 
function f(k) will jump from a large f(ko) to a small f(ko + 1). The probability 
that G contains a clique of size kp + 1 is at most f(ko + 1), which will be very small. 
When f (ko) is large, Theorem 4.5.1 implies that G contains a clique of size kp with 
probability nearly 1. Together, with very high probability w(G) = ko. For some n 
one of the values f(ko), f(ko + 1) may be of moderate size so this argument does 
not apply. Still one may show a strong concentration result found independently by 
Bollobds and Erdos (1976) and Matula (1976). 


2% =n Mo) = 91). 


Corollary 4.5.2 There exists k = k(n) so that 
Pr (w(G) =kork+1) 1. 


We give yet stronger results on the distribution of w(G) in Section 10.2. 


4.6 DISTINCT SUMS 


A set £1,...,X% Of positive integers is said to have distinct sums if all sums 


Sty Se sea 


i€ES 
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are distinct. Let f(n) denote the maximal & for which there exists a set 


(Wiges satiny © 4hiz.< sa} 


with distinct sums. The simplest example of a set with distinct sums is {2° : i < 
log, n}. This example shows 


f(n) 21+ [log. n| . 

Erdos offered $300 for a proof or disproof that 
f(n) < loggn+C 

for some constant C’. From above, as all 2/(”) sums are distinct and less than nk, 

2) < nk = nf(n), 
and so 

f(n) < logy n + logy logy n + O(1). 

Examination of the second moment gives a modest improvement. Fix {71,..., 2%} C 
{1,...,} with distinct sums. Let €),...,€, be independent with 


Pr [e; = 1] = Pr [e, = 0] = 1/2 


and set 
X= er, +++ + egrg. 
(We may think of X as a random sum.) Set 


p=E[X]=— 


and o? = Var [X]. We bound 
9 tt+---+22 e n°k 
= 4 ~ 4 
so that o < nVk /2. By Chebyshev’s Inequality for any A > 1, 
Pr [lx —p| > Anv'ke/2] < 72, 


Reversing, 

1 
» 
But X has any particular value with probability either zero or 2—* since, critically, a 
sum can be achieved in at most one way. Thus 


Pr [Ix — yl < Anv'k/2| < 2-*(\nVk +1) 


(aos py [Ix Sle Anvk/2] 


and 
2*(1 —A-?) —1 
n> ———__.. 


VEX 


While \ = V3 gives optimal results any choice of \ > 1 gives the following. 
Theorem 4.6.1 f(n) < logs n+ (1/2) log, logy n + O(1). 
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4.7 THE RODL NIBBLE 


For 2<1<k<nlet M(n,k, 1), the covering number, denote the minimal size of a 
family K of k-element subsets of {1,...,} having the property that every /-element 


set is contained in at least one A € K. Clearly M(n, k,l) > (7)/(4) since each k-set 


covers (7) l-sets and every [-set must be covered. Equality holds if and only if the 
family K has the property that every /-set is contained in exactly one A € K. This 
is called an (n, k,l) tactical configuration (or block design). For example, (n, 3, 2) 
tactical configurations are better known as Steiner Triple Systems. The question of 
the existence of tactical configurations is a central one for combinatorics but one for 
which probabilistic methods (at least so far!) play little role. In 1963 Paul Erdds and 
Haim Hanani conjectured that for fixed 2 <1 < k, 

lim M(n, ul 1) = 

mee ATG) 


Their conjecture was, roughly, that one can get asymptotically close to a tactical 
configuration. While this conjecture seemed ideal for a probabilistic analysis it was a 
full generation before R6d! (1985) found the proof, which we describe in this section. 
[One may similarly define the packing number m(n, k,l) as the maximal size of a 
family K of k-element subsets of {1,...,7} having the property that every /-element 
set is contained in at most one A € K. Erdos and Hanani noticed from elementary 
arguments that 


lim a i !) =1 <= iim Et EW) = 

Be EG) meres (TG) 
While the Rédl result may be formulated in terms of either packing or covering here 
we deal only with the covering problem. ] 

Several researchers realized that the Rédl method applies in a much more general 
setting, dealing with covers in uniform hypergraphs. This was first observed by 
Frankl and Rédl and has been simplified and extended by Pippenger and Spencer 
(1989) as well as by Kahn (1996). Our treatment here follows the one in Pippenger 
and Spencer (1989) and is based on the description of Fiiredi (1988), where the main 
tool is the second moment method. 

For an r-uniform hypergraph H = (V, E) and fora vertex x € V, we let dy(z) [or 
simply d(x), when there is no danger of confusion] denote the degree of x in H, that 
is, the number of edges containing x. Similarly, for x7, y € V, d(x, y) = dy(z, y) is 
the number of edges of H containing both x and y. A covering of H is a set of edges 
whose union contains all vertices. In what follows, whenever we write --é6 we mean 
a quantity between —6d and 6. The following theorem is due to Pippenger, following 
Frankl and Rédl. 


Theorem 4.7.1 For every integer r > 2 and reals k > 1 and a > O, there are 
y = ¥(r,k,a) > 0 and do = do(r,k,a) such that for every n > D > do the 
following holds. 
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Every r-uniform hypergraph H = (V,F) ona set V of n vertices in which all 
vertices have positive degrees and which satisfies the following conditions: 


(1) For all vertices x € V but at most yn of them, d(x) = (1+ y)D. 
(2) Forallx € V, d(x) < kD. 
(3) For any two distinct x,y € V, d(x, y) < yD. 

contains a cover of at most (1 + a)(n/r) edges. 


The basic idea in the proof is simple. Fixing a small « > 0 one shows that a random 
set of roughly en/r edges has, with high probability, only some O(e?n) vertices 
covered more than once, and hence covers at least en — O(e?n) vertices. Moreover, 
after deleting the vertices covered, the induced hypergraph on the remaining vertices 
still satisfies the properties described in (/), (2) and (3) above (for some other values 
of n,y,k and D). Therefore one can choose again a random set of edges of this 
hypergraph, covering roughly an e-fraction of its vertices with nearly no overlaps. 
Proceeding in this way for a large number of times we are finally left with at most en 
uncovered vertices, and we then cover them trivially, by taking for each of them an 
arbitrarily chosen edge containing it. Since ¢ is sufficiently small, although this last 
step is very inefficient, it can be tolerated. 

The technical details require a careful application of the second moment method, 
used several times in the proof of the following lemma. 


Lemma 4.7.2 For every integer r > 2 and reals K > 1 and e€ > 0, and for every 
real 5’ > 0, there are 6 = 6(r, K,€, 6’) > 0 and Do = Do(r, K, €, 6’) such that for 
everyn > D > Do the following holds. 

Every r-uniform hypergraph H = (V, E) ona set V of n vertices which satisfies 
the following conditions: 


(i) For all vertices x € V but at most 6n of them, d(x) = (1+ 6)D. 
(ii) Forallx EV, d(x) < KD. 
(iii) For any two distinct x,y € V, d(x,y) < 6D. 
contains a set E' of edges with the following properties: 
(iv) |E'| = (en/r)(1 £6) 
(v) The set V' = V — Ucerre is of cardinality |V'| = ne~*(1 + 6’). 
(vi) For all vertices x € V' but at most 6'|V"| of them, the degree d'(x) of x in the 


induced hypergraph of H on V' satisfies d'(x) = De~*"-Y(1 + 6’). 


Proof. Throughout the proof we assume, whenever this is needed, that D (and 
hence 7) is sufficiently large. We denote by 6), 62, ... positive constants (that can be 
explicitly estimated) that tend to 0 when 6 tends to 0 and D tends to infinity (for fixed 
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r, K,¢). Therefore, by choosing 6 and Do appropriately, we can ensure that each of 
those will be smaller than 6’. 

Let E’ be arandom subset of E obtained by picking, randomly and independently, 
each edge in E to be a member of E” with probability p = «/D. We have to show 
that with positive probability, the properties (iv), (v) and (vi) hold. 

The proof that (iv) holds is easy. Note that by the assumptions # has at least 
(1 — 5)n vertices of degree at least (1 — 5)D, showing that its number of edges 
is at least (1 — 6)?nD/r. Similarly, the number of edges of H does not exceed 
(1 + 6)Dn + 6nKD)/r. Therefore |E| = (14 6,)Dn/r. It follows that the 
expected value of the size of E” satisfies E {|E’|| = |E|p = (1 + 61)(en/r) and its 
variance is Var [|£’|] = |E|p(1 — p) < (1 + 61)(en/r). Therefore, by Chebyshev’s 
Inequality, for an appropriately chosen 52 > 0, 


ite en 
Pr [|z'| = (1+ &2)—=| > 0.99, 


say, giving (iv). 

To prove (v), define for each vertex x € V an indicator random variable I, where 
I, = Lif x ¢ Ueexe and I, = 0 otherwise. Note that |V'| = i iey Je. Calla 
vertex x € V good if d(x) = (1 +6)D; otherwise call it bad. If x is good, then 


E (I, = Pr re = 1] S(t — p)*) - (1 _ or 


=e *(11+3). 

If x is bad then, clearly, 0 < E[Z,] < 1. Since there are at most dn bad vertices it 
follows, by linearity of expectation, that the expected value of |V’| is ne~*(1 + 64). 
To compute the variance of |V"| = $0 .-y Jc, note that 


Var (\V'] = $2 VarlIe]+ > Cov {[Iz; Zyl 
zrEeV xzyEeV,cfxy 
< EllV’]+ S° Cov{[le, Jy] - 
zjyeViaxy 
However, 
Cov [In,ly) = El[tely] — E [Ze] E [Jy] 
= (1—p)*r)+4)—day) _ (1 — p)t=)+4(y) 
-6D 
< (1p) -1< (1-5) -1<6. 


D 


It follows that 
Var [|V’l] < E[V"l] + ds? < 65 (E(IV'II) 
which, by Chebyshev, implies that with probability at least 0.99 


[V"| = (1 £ 47) B[|V"|] = (1 4 dg)ne™*, 
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as claimed in (v). 
It remains to prove (vi). To do so note, first, that all but at most dgn vertices x 
satisfy the following two conditions: 


(A) d(x) = (144)D, and 
(B) all but at most 619D edges e € EF with x € e satisfy 
{feb :a¢f, foeFO}|=(Utdn)(r-1D. (4.1) 


Indeed, (A) holds for all but én < dgn/2 vertices, by assumption. Moreover, the 
total number of edges containing vertices whose degrees are not (1 + 6) D is at most 
6nK D and hence the number of vertices contained in more than 619 D such edges is 
at most 6nK Dr/(6i9D) < 69n/2 for an appropriate choice of 69,619. Note, next, 
that if « € e and e contains no vertex of degree which is not (1 + 6)D then, since 
d(y,z) < 6D for all y, z, the number of edges f not containing x that intersect e 
is at most (r — 1)(1 + 5)D and at least (r — 1)(1 + 6)D — ("5')6D and hence e 
satisfies (4.1). 

It thus suffices to show that for most of the vertices x satisfying (A) and (B), d’(x) 
satisfies (iv). Fix such a vertex x. Call an edge e with x € e good if it satisfies (4.1). 
Conditioning on x € V’, the probability that a good edge containing x stays in the 
hypergraph on V’ is (1 — p)(!*4:1)("—1)P Therefore the expected value of d’(z) is 


E [d'(x)] = (1 + O10 + d)D — p)htoiu)(r-DD + d19D = e DUDA + 612) . 


For each edge e containing z, let J. denote the indicator random variable whose 
value is 1 iff e is contained in V’. Then, the degree d’(x) is simply the sum of these 
indicator random variables, conditioned on x € V’. It follows that 


Var [d'(x)] < Eld'(z)]+ S> Cov [Ie, Ty] 
xe€e,xef 


E [d' (x)] + 26,9D?(1 + 6) + S- Cov [Ie, If] . 


x€e,r€ f,e,fgood 


IA 


It remains to bound the sum >) cere f.c,fgooa COV [Le, 1+]. For each fixed good e 
this sum is a sum of the form}. ¢, frooa Cov [Ze, 17] . There are at most (r — 1)6D 
edges f in the last sum for which |eN f| > 1, and their contribution to the sum cannot 
exceed (r—1)6D. IfeNf = {x} then let t(e, f) denote the number of edges of H that 
intersect both e and f and do not contain x. Clearly, in this case, t(e, f) < (r—1)*6D. 
It follows that for such e and f, Cov [Ic, I] < (1 — p) “4 — 1 < 613, implying 
that for each fixed good edge e, 


S> Cov [Ie, Ip] < (r — 1)6D + D(1 + 6)d13 < baD. 
xe f,fgood 


As the sum Di ekewereiiood 
quantities, we conclude that 


Cov [I-, /] is the sum of at most D(1 + 6) such 


Var [d'(x)] < E[d'(x)| + 615D? < 616 (E[d'(z)])? 
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It thus follows, by Chebyshev, that with probability at most 617, d’(x) is not (1 + 
61g)De—6"-0), and therefore, by Markov, that with probability at least, say, 0.99, 
for all but at most dj97n vertices, d'(z) = (1 + 61g)De~«"-)), This completes the 
proof of the lemma. | 


Proof [Theorem 4.7.1]. Fix « > 0 such that 


: +re<l+a 
1—e-€ ; 
and fix 1/10 > 6 > 0 such that 
(1+ 46) +re <1 +a. 


Fix an integer ¢ so that e~* < e€. The theorem is proved by applying the lemma t¢ 
times. Put 6 = 6; and then define, by reverse induction 6; > 6:;-1 > --- > dg such 
that 6; < d:4,e 8»), Tegel +6;) < 1+ 26, and forn > D > R, one can 
apply the lemma with r, K = ke"), ¢, 6! = 64 and 6 = 6;. This will give the 
assertion of the theorem with y = do, d) = max R,. Indeed, by applying the lemma 
repeatedly we obtain a decreasing sequence of sets of vertices V = Vo,Vi,..., Vi, 
each contained in the previous one, and a sequence of sets of edges FE), Eo,..., Et, 
where FE; is the set of edges E’ obtained in the application of the lemma to the 
hypergraph induced on V;_,. Here 


Vil = Vale" 46:) (= [Vole*(1 + 26)), 


a) = Mala 24) <1 448)Pee me, 


and 
D; = D460) = De~r-D ; 


By covering each vertex of V; separately by an edge containing it we conclude that 
the total number of edges in the cover obtained is at most 


t-1 
EN ic en 1 a 
n € 
< -(14+4 —— 
< a + (a+r) 
n 
] oo 
as 1h +a) 
This completes the proof. a 


We conclude the section by showing how the theorem quickly implies R6édl solu- 
tion of the Erd6s—Hanani problem mentioned at the beginning of the section. 
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Theorem 4.7.3 [R6dl] For k, | fixed, 


M(n,k,l) < (1+ oan(7)/(7) 


where the o(1) term tends to zero as n tends to infinity. 


Proof. Put r = (*) and let H be the r-uniform hypergraph whose vertices are all 
[-subsets of {1,2,...,n}, and whose edges are all collections of G) [-tuples that lie 


ina k-set. H has (7) vertices, each of its vertices has degree D = ane and every 


two distinct vertices lie in at most (77/7 }) = o(D) common edges. Therefore, by 


Theorem 4.7.1, H has a cover of size at most (1 + 0(1))(") / (7), as needed. = 


4.8 EXERCISES 


1. Let X be a random variable taking integral nonnegative values, let E [x | 
denote the expectation of its square, and let Var [X] denote its variance. Prove 
that Var [X] 

ar 
Pr[X = 0| < —— . 
SC) 

2. (*) Show that there is a positive constant c such that the following holds. For 
any nreals a1, @2,..., Gn satisfying >; , a? = Lif(ey,...,€,) isa {—1, 1}- 
random vector obtained by choosing each e; randomly and independently with 
uniform distribution to be either —1 or 1, then 


| 


3. (*) Show that there is a positive constant c such that the following holds. For any 
n Vectors @1,A2,...,@n € R® satisfying $>""_, ||a:||? = 1 and |ja;|| < 1/10, 
where || - || denotes the usual Euclidean norm, if (€),...,€,) is a {~1,1}- 
random vector obtained by choosing each e; randomly and independently with 
uniform distribution to be either —1 or 1, then 


Pr | - €;4; 


i=1 
4, Let X be a random variable with expectation E{X] = 0 and variance o?. 
Prove that for all A > 0, 


n 


y €;Q; 


i=1 


<1 >C. 


< ul >c. 
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5. Let v1 = (21, y1),---,Un = (Xn, Yn) be n two-dimensional vectors, where 
each x; and each y; is an integer whose absolute value does not exceed 
2”/? /(100,/n). Show that there are two disjoint sets J, J ¢ {1,2,...,n} 


such that 
Do De 
ie ged 


6. (*) Prove that for every set X of at least Ak? distinct residue classes modulo 
a prime p, there is an integer a such that the set {az (mod p): 2 € X} 
intersects every interval in {0,1,...,p — 1} of length at least p/k. 


THE PROBABILISTIC LENS: 
Hamiltonian Paths 


What is the maximum possible number of directed Hamiltonian paths in a tournament 
on n vertices? Denote this number by P(n). The first application of the probabilistic 
method in combinatorics is the result of Szele (1943) described in Chapter 2, which 
states that P(n) > n!/2”~1!. This bound follows immediately from the observation 
that the right-hand side is the expected number of such paths in a random tournament 
on n vertices. In the same paper Szele shows that 


1 a 


NO 


proves that this limit does exist, and conjectures that its correct value is 1/2. 

This conjecture is proved in Alon (1990a). The proof is given below. The main 
tool is the Brégman proof of the Minc Conjecture for the permanent of a (0, 1)-matrix, 
described in The Probabilistic Lens: Brégman Theorem (following Chapter 2). 


Theorem 1 There exists a positive constant c such that for every n, 


n!} 
P(n) < en3/? ea ; 


Proof. For a tournament 7’, denote by P(T) the number of directed Hamiltonian 
paths of T. Similarly, C(T’) denotes the number of directed Hamiltonian cycles of 
T, and F(T) denotes the number of spanning subgraphs of T in which the indegree 
and the outdegree of every vertex is exactly 1. Clearly, 


C(T) < F(T). (1) 
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If T = (V,£) is a tournament on a set V = {1,2,...,n} of n vertices, the 
adjacency matrix of T is the n by n (0, 1)-matrix Ar = (a,;) defined by a;,; = 1 if 
(i,j) € Eand a,; = 0 otherwise. Let r; denote the number of ones in row 7. Clearly, 


s r= 6 (2) 


By interpreting combinatorially the terms in the expansion of the permanent 
per(A,), it follows that 
per(Ar) = F(T). (3) 


We need the following technical lemma. 
Lemma 2 For every two integers a, b satisfying b > a+2 > a> 1 the inequality 
(al)!/2. (1)2/? < ((a + 1)I)MFD «(1b — II /O-Y 


holds. 


Proof. The assertion is simply that f(a) < f(b — 1), for the function f defined by 
f(a) = (a!)/2/((a + 1)!)/(@+))_ Thus it suffices to show that for every integer 
x > 2, f(a —1) < f(a). Substituting the expression for f and raising both sides to 
the power x(x — 1)(x + 1) it follows that it suffices to show that for all x > 2, 


((x = Hyrery ((a+ Lyre) < (x!)2(e?-)) 


pe 2 etl x(x—1) 
aa > : 
Cee 


This is certainly true for x = 2. For x > 3 it follows from the facts that 47 > e7*1, 
that x! < ((a + 1)/2)* and that e?~! > ((2 +1)/x2)*@—-), a 


that is, 


Corollary 3 Define g(a) = (a!)'/*. For every integer S > n the maximum of the 
function J], (xi) subject to the constraints \~\_, x; = S and x; > \ are integers, 
is obtained iff the variables x; are as equal as possible (i.e., iff each x; is either | S/n| 


or {S/n].) 


Proof. If there are indices 7 and 7 such that x; > x; + 2 then, by Lemma 2, the value 
of the product would increase once we add one to x; and subtract one from z;. J 


Returning to our tournament T' we observe that the numbers r; defined above are 
precisely the outdegrees of the vertices of T’. If at least one of these is 0, then clearly 
C(T) = F(T) = 0. Otherwise, by Brégman’s Theorem, by Corollary 3 and by (2) 
and (3), F(T’) is at most the value of the function []/"_, (ri!)!/"*, where the integral 
variables r; satisfy (2) and are as equal as possible. By a straightforward (though 
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somewhat tedious) derivation of the asymptotics using Stirling’s formula this gives 
the following. 


Proposition 4 For every tournament T on n vertices, 


Vt 3 jo(n— 1)! 

To complete the proof of the theorem, we have to derive a bound for the number of 
Hamiltonian paths in a tournament from the above result. Given a tournament S on n 
vertices, let T be the random tournament obtained from S by adding to it a new vertex 
y and by orienting each edge connecting y with one of the vertices of S, randomly 
and independently. For every fixed Hamiltonian path in S, the probability that it 
can be extended to a Hamiltonian cycle in T is precisely 1/4. Thus the expected 
number of Hamiltonian cycles in T is 4 P(S) and hence there is a specific T for 
which C(T) > +P(S). However, by Proposition 4, 


O(T) < (1 + 0(1)) Yin yr 


and thus ' 
ni 


completing the proof of Theorem 1. a 
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The Local Lemma 


It’s a thing that non-mathematicians don’t realize. Mathematics is actually an 
esthetic subject almost entirely. 


— John Conway 


5.1 THE LEMMA 


In a typical probabilistic proof of a combinatorial result, one usually has to show that 
the probability of a certain event is positive. However, many of these proofs actually 
give more and show that the probability of the event considered is not only positive 
but is large. In fact, most probabilistic proofs deal with events that hold with high 
probability; that is, a probability that tends to 1 as the dimensions of the problem 
grow. For example, consider the proof given in Chapter 1 that for each k > 1 there 
are tournaments in which for every set of k players there is one who beats them 
all. The proof actually shows that for every fixed & if the number n of players is 
sufficiently large then almost all tournaments with n players satisfy this property; that 
is, the probability that a random tournament with n players has the desired property 
tends to 1 as n tends to infinity. 

On the other hand, there is a trivial case in which one can show that a certain event 
holds with positive, though very small, probability. Indeed, if we have n mutually 
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independent events and each of them holds with probability at least p > 0, then 
the probability that all events hold simultaneously is at least p”, which is positive, 
although it may be exponentially small in n. 

It is natural to expect that the case of mutual independence can be generalized 
to that of rare dependencies and provide a more general way of proving that certain 
events hold with positive, though small, probability. Such a generalization is, indeed, 
possible and is stated in the following lemma, known as the Lovasz Local Lemma. 
This simple lemma, first proved in Erdos and Lovasz (1975), is an extremely powerful 
tool, as it supplies a way for dealing with rare events. 


Lemma 5.1.1 [The Local Lemma; General Case] Let Aj, A2,..., An be events 
in an arbitrary probability space. A directed graph D = (V,E) on the set of 
vertices V = {1,2,...,n} is called a dependency digraph for the events Ay,..., An 
if for each i, 1 <1 <n, the event A; is mutually independent of all the events 
{Aj : (i,j) € E}. Suppose that D = (V, E) is a dependency digraph for the above 
events and suppose there are real numbers 11,...,2y, such that0 < x; < land 
Pr [Aj] < 2; HeijyeeG —2;)forall1 <i<n. Then 


P| a > [Ja-2i). 
i=1 i=l 
In particular, with positive probability no event A; holds. 


Proof. We first prove, by induction on s, that for any S Cc {1,...,n}, 
and any i ¢ S, 


S|j=s<n, 


Pr | Ai| A Aj| <2. (5.1) 
jEs 


This is certainly true for s = 0. Assuming it holds for all s’ < s, we prove it for s. 


Put S} ={7 653,59) € E}, So=S \ Si. Then 


Pr A; A . Ay Ag 
Dp A; | AA = (Ajes, % Aces, (| (5.2) 


jes Pr Aves: Aj | Nees, Ay| 


To bound the numerator observe that since A; is mutually independent of the events 
{Ape :LE So}, 


Pr|A;A \ 4 | A Ae <P |a A *| 


jES1 LES2 LES2 


(4j9)EE 
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The denominator, on the other hand, can be bounded by the induction hypothesis. 
Indeed, suppose S; = {71, j2,...,jr}. If r = 0 then the denominator is 1, and (5.1) 
follows. Otherwise 


r}Aj,A Aj, A+-:AA 


(1-Ps Aj, 


\ im 


LESo 


) (ie 


\ x 


Aj, | Aj A /\ Ae 


LES2 LES2 
(1-pe[aslara~ntira A H]) 
LES2 
> (1—2;,)(1—2y.)---(1-a;,)> [J] (-2)). (5.4) 
(i,j)EE 


Substituting (5.3) and (5.4) into (5.2) we conclude that Pr E | AjesA Ai] < Li, 


completing the proof of the induction. 
The assertion of Lemma 5.1.1 now follows easily, as 


PA] - (1 — Pr[Aj]) - (1 — Pr [Ap | Ai]) 
-(1-Pel 4A] > >[Ja-2), 


i=1 


completing the proof. a 


Corollary 5.1.2 [The Local Lemma; Symmetric Case] Let A,, Ag,...,An be 
events in an arbitrary probability space. Suppose that each event A; is mutually 
independent of a set of all the other events A; but at most d, and that Pr |A;] < pfor 
alll<i<n. If 

ep(d+1) <1 (5.5) 


then Pr [Aj_, Ai] > 0. 


Proof. If d = 0 the result is trivial. Otherwise, by the assumption there is a 
dependency digraph D = (V,£E) for the events A,,...,A, in which for each i, 
l{j : (7) € E}| < d. The result now follows from Lemma 5.1.1 by taking 
x; = 1/(d + 1) (< 1) for all 7 and using the fact that for any d > 1, 
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It is worth noting that, as shown by Shearer (1985), the constant “e” is the best 
possible constant in inequality (5.5). Note also that the proof of Lemma 5.1.1 
indicates that the conclusion remains true even when we replace the two assumptions 
that each A; is mutually independent of {A, : (i,j) ¢ &} and that for each i 


Pr [Ai] <a; [[ (-2;) 


(ij)EE 


by the weaker assumption that for each i and each Sp C {1,...,n}\{j: (4,9) € E}, 


JES2 Gj)EB 


This turns out to be useful in certain applications. 

In the next few sections we present various applications of the Local Lemma for 
obtaining combinatorial results. There is no known proof of any of these results, 
which does not use the Local Lemma. Additional applications of the Local Lemma 
for coloring problems, and much much more, can be found in Molloy and Reed 
(1999). 


5.2 PROPERTY B AND MULTICOLORED SETS OF REAL NUMBERS 


Recall that a hypergraph H = (V, E) has property B (i.e., is two-colorable), if there 
is a coloring of V by two colors so that no edge f € & is monochromatic. 


Theorem 5.2.1 Let H = (V, E) be a hypergraph in which every edge has at least 
k elements, and suppose that each edge of H intersects at most d other edges. If 
e(d +1) < 2*-! then H has property B. 


Proof. Color each vertex v of H, randomly and independently, either blue or red (with 
equal probability). For each edge f € E, let Ay be the event that f is monochromatic. 
Clearly Pr [Ay] = 2/2/41 < 1/2*-1. Moreover, each event A, is clearly mutually 
independent of all the other events Ay, for all edges f’ that do not intersect f. The 
result now follows from Corollary 5.1.2. a 


A special case of Theorem 5.2.1 is that for any k > 9, any k-uniform k-regular 
hypergraph H has property B. Indeed, since any edge f of such an H contains 
k vertices, each of which is incident with k edges (including f), it follows that 
f intersects at most d = k(k — 1) other edges. The desired result follows, since 
e(k(k — 1) +1) < 2*7! for each k > 9. 

The next result we consider, which appeared in the original paper of Erdds and 
Lovasz, deals with k-colorings of the real numbers. For a k-coloring c : R — 
{1,2,...,k} of the real numbers by the k colors 1, 2,..., k, and for a subset T C R, 
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we say that T’ is multicolored (with respect to c) if c(T)) = {1,2,...,k}; that is, if T 
contains elements of all colors. 


Theorem 5.2.2 Let m and k be two positive integers satisfying 
1 m 
e(m(m—1) +1)k (1-5) <1. (5.6) 


Then, for any set S of m real numbers there is a k-coloring so that each translation 
x-+ S (for x € R) is multicolored. 


Note that (5.6) holds whenever m > (3 + 0(1))k log k. 


Proof. We first fix a finite subset X C R and show the existence of a k-coloring so 
that each translation x + S (for z € X) is multicolored. This is an easy consequence 
of the Local Lemma. Indeed, put Y = U,-y (x +S) and letc: Y — {1,2,...,k} 
be a random k-coloring of Y obtained by choosing, for each y € Y, randomly 
and independently, c(y) € {1,2,...,k} according to a uniform distribution on 
{1,2,...,k}. Foreacha € X, let A, be the event that z+ S is not multicolored (with 
respect to c). Clearly Pr[A,] < k(1— 1/k)™. Moreover, each event A, is mutually 
independent of all the other events A,” but those for which (x + S)M(a2’ +S) # 
0. As there are at most m(m — 1) such events, the desired result follows from 
Corollary 5.1.2. 

We can now prove the existence of a coloring of the set of all reals with the desired 
properties, by a standard compactness argument. Since the discrete space with k 
points is (trivially) compact, Tikhonov’s Theorem (which is equivalent to the axiom 
of choice) implies that an arbitrary product of such spaces is compact. In particular, 
the space of all functions from R to {1,2,...,k}, with the usual product topology, 
is compact. In this space for every fixed x € R, the set C, of all colorings c, such 
that « + S is multicolored, is closed. (In fact, it is both open and closed, since a 
basis to the open sets is the set of all colorings whose values are prescribed in a finite 
number of places). As we proved above, the intersection of any finite number of sets 
C, is nonempty. It thus follows, by compactness, that the intersection of all sets C, 
is nonempty. Any coloring in this intersection has the properties in the conclusion of 
Theorem 5.2.2. B 


Note that it is impossible, in general, to apply the Local Lemma to an infinite 
number of events and conclude that in some point of the probability space none 
of them holds. In fact, there are trivial examples of countably many mutually 
independent events Aj, satisfying Pr[A;] = 1/2 and A,,, Ai = 0. Thus the 
compactness argument is essential in the above proof. ~ 


5.3 LOWER BOUNDS FOR RAMSEY NUMBERS 


The derivation of lower bounds for Ramsey numbers by Erdés in 1947 was one of the 
first applications of the probabilistic method. The Local Lemma provides a simple 
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way of improving these bounds. Let us obtain, first, a lower bound for the diagonal 
Ramsey number R(k, &). Consider a random two-coloring of the edges of K,,. For 
each set S of k vertices of K,,, let As be the event that the complete graph on S 


is monochromatic. Clearly Pr[Ags] = 2'-(2). It is obvious that each event Ag is 
mutually independent of all the events Ay, but those which satisfy |SMT| > 2, since 
this is the only case in which the corresponding complete graphs share an edge. We 
can therefore apply Corollary 5.1.2 with p = 2)-(2) and d < (8) (273) to conclude 
the following. 


Proposition 5.3.1 [f'e(*) (22) -2?~() <1 then R(k,k) > n. 


A short computation shows that this gives R(k, k) > (/2/e)(1+0(1))k2*/2, only 
a factor 2 improvement on the bound obtained by the straightforward probabilistic 
method. Although this minor improvement is somewhat disappointing it is certainly 
not surprising; the Local Lemma is most powerful when the dependencies between 
events are rare, and this is not the case here. Indeed, there is a total number of 
K = ({) events considered, and the maximum outdegree d in the dependency 
digraph is roughly ee ( nays For large k and much larger n (which is the case of 
interest for us) we have d > K!~9(/*), that is, quite a lot of dependencies. On 
the other hand, if we consider small sets S (e.g., sets of size 3) we observe that out 
of the total K = (%) of them each shares an edge with only 3(n — 3) = K1/°. 
This suggests that the Local Lemma may be much more significant in improving the 
off-diagonal Ramsey numbers R(k, ¢), especially if one of the parameters, say, @, is 
small. Let us consider, for example, following Spencer (1977), the Ramsey number 
R(k,3). Here, of course, we have to apply the nonsymmetric form of the Local 
Lemma. Let us two-color the edges of K,, randomly and independently, where each 
edge is colored blue with probability p. For each set of three vertices T, let Ar be 
the event that the triangle on T is blue. Similarly, for each set of k vertices S, let 
Bg be the event that the complete graph on S is red. Clearly Pr{[Ar] = p® and 


Pr [Bs] = (1 - p)(2). Construct a dependency digraph for the events Ay and Bs 
by joining two vertices by edges (in both directions) iff the corresponding complete 
graphs share an edge. Clearly each A7y-node of the dependency graph is adjacent to 
3(n — 3) < 3n Aq’-nodes and to at most (7) Bs:-nodes. Similarly, each Bs-node 
is adjacent to at most (%)(n — 2) < k?n/2 Ar’-nodes and to at most (7) Bs-nodes. 
It follows from the general case of the Local Lemma (Lemma 5.1.1) that if we can 
find a0 < p < 1 and two real numbers 0 < x < 1 andO < y < 1 such that 


n 
k 


p< a(1—2)"(1—y)) 


and k 2 n 
(1 p)@) <y(1 — a) ™/2(1 — y)(@) 
then R(k, 3) > n. 
Our objective is to find the largest possible k = k(n) for which there is such a 
choice of p,x and y. An elementary (but tedious) computation shows that the best 
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choice is when p = cyn™!/?, k = con}/? log n, x = c3/n°/? and y so that (7)y = cg. 
This gives R(k,3) > csk?/log? k. A similar argument gives R(k,4) > k°/2+0(), 
In both cases the amount of computation required is considerable. However, the hard 
work does pay; the bound R(k,3) > csk?/ log? k matches a lower bound of Erdés 
proved in 1961 by a highly complicated probabilistic argument. This was improved 
to R(k,3) > cgk?/ log k by Kim (1995). The bound above for R(k, 4) is better than 
any bound for R(k, 4) known to be proved without the Local Lemma. 


5.4 A GEOMETRIC RESULT 


A family of open unit balls F in the three-dimensional Euclidean space R? is called a 
k-fold covering of R? if any point x € R® belongs to at least k balls. In particular, a 1- 
fold covering is simply called a covering. A k-fold covering F is called decomposable 
if there is a partition of F into two pairwise disjoint families 7; and F2, each being a 
covering of R?. Mani-Levitska and Pach (1988) constructed, for any integer k > 1, 
a non decomposable k-fold covering of R° by open unit balls. On the other hand, 
they proved that any k-fold covering of R? in which no point is covered by more than 
c2*/3 balls is decomposable. This reveals a somewhat surprising phenomenon: it is 
more difficult to decompose coverings that cover some of the points of R* too often 
than to decompose coverings that cover every point about the same number of times. 
The exact statement of the Mani-Levitska—Pach Theorem is the following. 


Theorem 5.4.1 Let F = {Bj}ier be a k-fold covering of the three-dimensional 
Euclidean space by open unit balls. Suppose, further, that no point of R° is contained 
in more than t members of F. If 


e: Poor? < 1 


then F is decomposable. 


Proof. Define an infinite hypergraph H = (V(H), E(H)) as follows. The set of 
vertices of H, V(H), is simply F = {B;}ic;. For each x € R® let E, be the set of 
balls B; € F that contain x. The set of edges of H, E(H), is simply the set of E,, 
with the understanding that when E, = E, the edge is taken only once. We claim 
each edge E, intersects less than t2'° other edges E, of H. If x € B; the center of 
B, is within distance 1 of x. If now B; 0 B; ¥ the center of B; is within distance 
three of x and so B; lies entirely inside the ball of radius four centered at xz. Such a 
B; covers precisely 4~? = 2~° of the volume of that ball. As no vertex is covered 
more than ¢ times there can be at most 2°¢ such balls. It is not too difficult to check 
that m balls in R? cut R® into less than m* connected components so that there are 
at most (2°¢)? distinct E,, overlapping E,. 

Consider, now, any finite subhypergraph L of H. Each edge of L has at least k 
vertices, and it intersects at most d < t°21® other edges of L. Since, by assumption, 
e(d +1) < 2*-!, Theorem 5.2.1 (which is a simple corollary of the Local Lemma), 
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implies that L is two-colorable. This means that one can color the vertices of L blue 
and red so that no edge of LZ is monochromatic. Since this holds for any finite L, 
a compactness argument, analogous to the one used in the proof of Theorem 5.2.2, 
shows that H is two-colorable. Given a two-coloring of H with no monochromatic 
edges, we simply let F, be the set of all blue balls, and F2 be the set of all red ones. 
Clearly each F; is a covering of R*, completing the proof of the theorem. a 


It is worth noting that Theorem 5.4.1 can easily be generalized to higher dimen- 
sions. We omit the detailed statement of this generalization. 


5.5 THE LINEAR ARBORICITY OF GRAPHS 


A linear forest is a forest (i.e., an acyclic simple graph) in which every connected 
component is a path. The linear arboricity la(G) of a graph G is the minimum 
number of linear forests in G, whose union is the set of all edges of G. This notion 
was introduced by Harary as one of the covering invariants of graphs. The following 
conjecture, known as the Linear Arboricity Conjecture, was raised in Akiyama, Exoo 
and Harary (1981). 


Conjecture 5.5.1 [The Linear Arboricity Conjecture} The linear arboricity of 
every d-regular graph is [(d + 1)/2]. 


Note that since every d-regular graph G on n vertices has nd/2 edges, and every 
linear forest in it has at most n — 1 edges, the inequality 


nd d 
OY ay 2 
is immediate. Since la(G) is an integer this gives la(G) > [(d+1)/2]. The 
difficulty in Conjecture 5.5.1 lies in proving the converse inequality: la(G) < 
[(d+1)/2]. Note also that since every graph G with maximum degree A is a 
subgraph of a A-regular graph (which may have more vertices, as well as more edges 
than G), the Linear Arboricity Conjecture is equivalent to the statement that the linear 
arboricity of every graph G with maximum degree A is at most [(A + 1)/2]. 
Although this conjecture received a considerable amount of attention, the best 
general result concerning it, proved without any probabilistic arguments, is that 
la(G) < [34/5] for even A and that la(G) < [(3A + 2)/5] for odd A. In this 
section we prove that for every « > 0 there is a Ag = Ag(e) such that for every 
A > Ag, the linear arboricity of every graph with maximum degree A is less than 
(3 + €) A. This result (with a somewhat more complicated proof) appears in Alon 
(1988) and its proof relies heavily on the Local Lemma. We note that this proof 
is more complicated than the other proofs given in this chapter and requires certain 
preparations, some of which are of independent interest. 
It is convenient to deduce the result for undirected graphs from its directed version. 
A d-regular digraph is a directed graph in which the indegree and the outdegree of 
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every vertex is precisely d. A linear directed forest is a directed graph in which every 
connected component is a directed path. The dilinear arboricity dla(G) of a directed 
graph G is the minimum number of linear directed forests in G whose union covers 
all edges of G. The directed version of the Linear Arboricity Conjecture, first stated 
in Nakayama and Peroche (1987), is the following. 


Conjecture 5.5.2 For every d-regular digraph D, 
dla(D) =d+1. 


Note that since the edges of any (connected) undirected 2d-regular graph G can 
be oriented along a Euler cycle, so that the resulting oriented digraph is d-regular, the 
validity of Conjecture 5.5.2 for d implies that of Conjecture 5.5.1 for 2d. 

It is easy to prove that any graph with n vertices and maximum degree d contains 
an independent set of size at least n/(d + 1). The following proposition shows that 
at the price of decreasing the size of such a set by a constant factor we can guarantee 
that it has a certain structure. 


Proposition 5.5.3 Let H = (V,E) be a graph with maximum degree d, and let 
V=V,UVW.U.---UV, bea partition of V into r pairwise disjoint sets. Suppose 
each set V; is of cardinality |V;| > 2ed, where e is the basis of the natural logarithm. 
Then there is an independent set of vertices W C V that contains a vertex from each 
Vj. 


Proof. Clearly we may assume that each set V; is of cardinality precisely g = [2ed] 
(otherwise, simply replace each V; by a subset of cardinality g of it, and replace H 
by its induced subgraph on the union of these r new sets). Let us pick from each set 
Y; randomly and independently a single vertex according to a uniform distribution. 
Let W be the random set of the vertices picked. To complete the proof we show that 
with positive probability W is an independent set of vertices in H. 

For each edge f of H, let Ay be the event that W contains both ends of f. Clearly, 
Pr [Ay] < 1/g?. Moreover, if the endpoints of f are in V; and in V;, then the event 
Ag is mutually independent of all the events corresponding to edges whose endpoints 
do not lie in V; U V;. Thus there is a dependency digraph for the events in which the 
maximum degree is less than 2gd, and since e- 2gd-1/g? = 2ed/g < 1 we conclude, 
by Corollary 5.1.2, that with positive probability none of the events Ay holds. But 
this means that W is an independent set containing a vertex from each V;, completing 
the proof. | 


Proposition 5.5.3 suffices to prove Conjecture 5.5.2 for digraphs with no short 
directed cycle. Recall that the directed girth of a digraph is the minimum length of a 
directed cycle in it. 


Theorem 5.5.4 Let G = (U, F) be a d-regular digraph with directed girth g > 8ed. 
Then 
dla(G) =d+1. 
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Proof. As is well known, F' can be partitioned into d pairwise disjoint 1-regular 
spanning subgraphs F),..., Fa of G. [This is an easy consequence of the Hall— 
K6énig Theorem; let H be the bipartite graph whose two classes of vertices A and 
B are copies of U, in which u € A is joined to v € B iff (u,v) € F. Since H is 
d-regular its edges can be decomposed into d perfect matchings, which correspond to 
d 1-regular spanning subgraphs of G’.] Each F; is a union of vertex disjoint directed 
cycles Ci1, Cig,..-,Cir,. Let Vi, Vo,..., Vr be the sets of edges of all the cycles 
{Cyp: 1 <isdl<j <r}. Clearly Vi, Vo,...,V, is a partition of the set F of 
all edges of G, and by the girth condition, |V;| > g > 8ed for alll <i<r. Let H 
be the line graph of G, that is, the graph whose set of vertices is the set F of edges of 
G in which two edges are adjacent iff they share a common vertex in G. Clearly H is 
4d — 2 regular. As the cardinality of each V; is at least 8ed > 2e(4d — 2), there is, by 
Proposition 5.5.3, an independent set of H containing a member from each V;. But 
this means that there is a matching M in G, containing at least one edge from each 
cycle C,; of the 1-factors F,,..., Fa. Therefore M, F; \ M,F2\ M,...,Fa\M 
are d + 1 directed forests in G (one of which is a matching) that cover all its edges. 
Hence 
dla(G) <d+1. 


As G has |U| - d edges and each directed linear forest can have at most |U| — 1 edges, 
dla(G) > |U|d/(|U| — 1) > d. 
Thus dla(G) = d+ 1, completing the proof. a 


The last theorem shows that the assertion of Conjecture 5.5.2 holds for digraphs 
with sufficiently large (directed) girth. In order to deal with digraphs with small girth, 
we show that most of the edges of each regular digraph can be decomposed into a 
relatively small number of almost regular digraphs with high girth. To do this, we 
need the following statement, which is proved using the Local Lemma. 


Lemma 5.5.5 Let G = (V, E) be a d-regular directed graph, where d is sufficiently 
large, and let p be an integer satisfying 10V/d < p < 20Vd. Then, there is a 
p-coloring of the vertices of G by the colors 0,1,2,...,p — 1 with the following 
property; for each vertex v € V and each color 1, the numbers 


Nt (v,i) = |{ue V: (v,u) € E and u is colored i}| 
and 
N~(v,1) = |{u€ V: (u,v) € E and u is colored i}| 


satisfy 


|N~(v,i) — d/p|, |N* (0,4) — d/p| < 3\/d/pvog d. (5.7) 


Proof. Let f : V — {0,1,...,p — 1} be a random vertex coloring of V by p 
colors, where for each v € V, f(v) € {0,1,...,p — 1} is chosen according to a 
uniform distribution. For every vertex v € V and every color 7,0 <2 < p, let Ae 
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be the event that the number N*(v, 7) of neighbors of v in G whose color is i does 
not satisfy inequality (5.7). Clearly N*(v,i) is a binomial random variable with 
expectation d/p and standard deviation ,\/(d/p)(1 — 1/p) < \/d/p. Hence, by the 
standard estimates for binomial distribution given in Appendix A, for every v € V 
and0 <i<p, 

Pr [At] < 1/d*. 
Similarly, if Az ; is the event that the number V~ (v, 2) violates (5.7) then 

Pr [Ay | < 1/d*. 
Clearly each of the events Ay, or A, ; is mutually independent of all the events 
At j or A, ; for all vertices u € V that do not have a common neighbor with v 
in G. Thus there is a dependency digraph for all our events with maximum degree 
< (2d)? - p. Since e - (1/d*)((2d)?p + 1) < 1, Corollary 5.1.2 (i.e., the symmetric 
form of the Local Lemma) implies that with positive probability no event Ay, or A, ; 
occurs. Hence there is a coloring f that satisfies (5.7) for all v € V and 0 2 i< D, 
completing the proof. a 


We are now ready to deal with general regular digraphs. Let G = (V,E) be 
an arbitrary d-regular digraph. Throughout the argument we assume, whenever it 
is needed, that d is sufficiently large. Let p be a prime satisfying 10d!/? < p < 
20d'/? (it is well known that for every n there is a prime between n and 2n). By 
Lemma 5.5.5 there is a vertex coloring f : V — {0,1,...,p — 1} satisfying (5.7). 
For each i, 0 <i < p, let G; = (V, E;) be the spanning subdigraph of G defined 
by FE; = {(u,v) € BE: fv) = f(u) +i (mod p)}. By inequality (5.7) the 
maximum indegree A; and the maximum outdegree A} in each G; is at most 
(d/ p)+3,/d/pV/log d. Moreover, foreachz > 0, the length of every directed cycle in 
G;; is divisible by p. Thus the directed girth g; of G; is at least p. Since each G; can be 
completed, by adding vertices and edges, to a A;-regular digraph with the same girth 
g; and with A; = max (AS; A; ), and since g; > 8eA, (for all sufficiently large d), 
we conclude, by Theorem 5.5.4, that dla(G;) < A;+1 < (d/p)+3,/d/pVlogd+1 
for all 1 < i < p. For Go, we only apply the trivial inequality 


d d 
dla(Go) < 2A9 < a a aft logd 


obtained, for example, by embedding Go as a subgraph of a Ao-regular graph, 
splitting the edges of this graph into Ap 1-regular spanning subgraphs, and breaking 
each of these 1-regular spanning subgraphs into two linear directed forests. The last 
two inequalities, together with the fact that 10Vd < ps 20Vd, imply 


dla(G) < a2 + sd Vogl +84 Ja +» 1<d+c-d°/*(logd)!/2. 


We have thus proved the following. 
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Theorem 5.5.6 There is an absolute constant c > 0 such that for every d-regular 
digraph G 
dla(G) < d+ cd9/4(log d)!/” 


We note that by being a little more careful, we can improve the error term to 
c'd?/3 (log d)!/3. Since the edges of any undirected d = 2f-regular graph can be 
oriented so that the resulting digraph is f-regular, and since any (2f — 1)-regular 
undirected graph is a subgraph of a 2f-regular graph, the last theorem implies the 
following. 


Theorem 5.5.7 There is an absolute constant c > 0 such that for every undirected 
d-regular graph G 


la(@):< = + cd?" Hoga)? 


Nl a 


5.6 LATIN TRANSVERSALS 


Following the proof of the Local Lemma we noted that the mutual independency 
assumption in this lemma can be replaced by the weaker assumption that the condi- 
tional probability of each event, given the mutual nonoccurrence of an arbitrary set of 
events, each nonadjacent to it in the dependency digraph, is sufficiently small. In this 
section we describe an application, from Erd6ds and Spencer (1991), of this modified 
version of the lemma. Let A = (a;;) be an n x m matrix with, say, integer entries. 
A permutation 7 is called a Latin transversal (of A) if the entries a;,(;) (1 < i < n) 
are all distinct. 


Theorem 5.6.1 Suppose k < (n — 1)/(4e) and suppose that no integer appears in 
more than k entries of A. Then A has a Latin transversal. 


Proof. Let 7 be a random permutation of {1,2,...,n}, chosen according to a 
uniform distribution among all possible n! permutations. Denote by T' the set of 
all ordered four-tuples (2, 7,7’, 7’) satisfying i < i’,7 # j/ and aj; = ay;. For 
each (i,7,7’,j’) € T, let Ajj; denote the event that m(i) = j and 7(2’) = 9’. 
The existence of a Latin transversal is equivalent to the statement that with positive 
probability none of these events hold. Let us define a symmetric digraph (i.e., a 
graph) G on the vertex set J’ by making (i, 7,7’, j’) adjacent to (p,q, p’,q’) if and 
only if {7,2’} M {p,p’} 4 O or {7,7} N{a,q'} AO. Thus these two four-tuples are 
not adjacent iff the four cells (i, 7), (t’, 7’), (p,q) and (p’,q’) occupy four distinct 
rows and columns of A. The maximum degree of G is less than 4nk; indeed, for a 
given (2, j,2’, 7’) € T there are at most 4n choices of (s, ¢) with either s € {i,2’} or 
t € {j, 7}, and for each of these choices of (s,t) there are less than k choices for 
(s’,t’) # (s,t) with ag = a,-y. Each such four-tuple (s, t, s’, t’) can be uniquely 
represented as (p,q, p’, q') with p < p’. Since e- 4nk- [1/n(n — 1)] < 1, the desired 
result follows from the above mentioned strengthening of the symmetric version of 
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the Local Lemma, if we can show that 


P| igi’! Anwar} s 


for any (7, j,i’, 7’) € T and any set S of members of 7’ that are nonadjacent in G to 
(i,j, 2,97’). By symmetry, we may assume that i = j = 1,7’ = 7’ = 2 and that hence 
none of the p’s or q’s are either | or 2. Let us call a permutation 7 good if it satisfies 
Ag Apap'g'> and let 5;; denote the set of all good permutations 7 satisfying (1) = 7 
and 7(2) = j. We claim that |Si2| < |5,;| for alli # j. Indeed, suppose first 
that 7,7 > 2. For each good a € Sj2 define a permutation z* as follows. Suppose 
w(x) = i, w(y) = j. Then define 7*(1) = t,7*(2) = j,a*(x) = La*(y) = 2 
and w*(t) = x(t) for all t # 1,2,2,y. One can easily check that x* is good, 
since the cells (1,7), (2,7), (x, 1), (y, 2) are not part of any (p,q,p’,q’) € S. Thus 
m* € S;;, and since the mapping 7 ~> 7* is injective |Si2| < [5;,|, as claimed. 
Similarly one can define injective mappings showing that |S,2| < |.;;| even when 
{i,j} O {1,2} FO. It follows that 


Cee anes 
~ n(n —1) 


(5.8) 


Pr Aj122 A ) Aonvre Aap? q’ 


<Pr sua AK Apa 
Ss 


for all 2 # j and hence that 


~— it 
P A A ‘g | <-—-.. 
. | 1122 | \ par’ | = Ain — 1) 
By symmetry, this implies (5.8) and completes the proof. | 


5.7 THE ALGORITHMIC ASPECT 


When the probabilistic method is applied to prove that a certain event holds with 
high probability, it often supplies an efficient deterministic, or at least randomized, 
algorithm for the corresponding problem. 

By applying the Local Lemma we often manage to prove that a given event holds 
with positive probability, although this probability may be exponentially small in 
the dimensions of the problem. Consequently, it is not clear if any of these proofs 
can provide polynomial algorithms for the corresponding algorithmic problems. For 
many years there was no known method of converting the proofs of any of the 
examples discussed in this chapter into an efficient algorithm. In 1991 J. Beck 
found such a method that works for some of these examples, with a little Joss in the 
constants. 

Beck (1991) demonstrated his method by considering the problem of hypergraph 
two-coloring. For simplicity we only describe here the case of fixed edge-size in 
which each edge intersects a fixed number of other edges. 
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Let n, d be fixed positive integers. By the (n, d) problem we mean the following: 
Given sets Aj,..., Aw © Q with all |A;| = n, such that no set A; intersects more 
than d other sets A,, find a two-coloring of 22 so that no A; is monochromatic. When 
e(d +1) < 2"~1, Theorem 5.2.1 assures us that this problem always does have a 
solution. Can we find the coloring in polynomial (in N for fixed n, d) time? Beck 
has given an affirmative answer under somewhat more restrictive assumptions. We 
assume 22 is of the form Q = {1,...,m}, m < Nn and the initial data structure 
consists of a list of the elements of the sets A; and a list giving for each element 7 
those i for which 7 € A;. We let G denote the dependency graph with vertices the 
sets A; and A,, A; adjacent if they overlap. 


Theorem 5.7.1 Let n,d be such that, setting D = d(d — 1), there exists a decom- 
position n = ny + n2 +3 with 


16D(i+d) < 2”, 
16D(1+d) < 2”, 
2e(1+d) < 2". 


Then there is a randomized algorithm with expected running time O (N(In N)°) for 
the (n, d) problem, where c is a constant (depending only on n and d). 


For € < 1/11, fixed, we note that the above conditions are satisfied, for n sufficiently 
large, when d < 2”¢ by taking ny = ng ~ 5n/11 and n3 ~ n/11. We emphasize 
again that the algorithmic analysis here is for fixed n, d and N approaching infinity, 
although the argument can be extended to the nonfixed case as well. 

Beck has given a deterministic algorithm for the (n, d) problem. The randomized 
algorithm we give may be derandomized using the techniques of Chapter 16. The 
running time remains polynomial but seemingly no longer N'+°™), Moreover, 
the algorithm can even be parallelized using some of the techniques in Chapter 16 
together with a certain modification in the algorithm. 


Proof. The First Pass. During this pass, points will be either red, blue, uncolored or 
saved. We move through the points 7 € {2 sequentially, coloring them red or blue at 
random, flipping a fair coin. After each 7 is colored we check all A; > 7. If A; now 
has m1 points in one color and no points in the other color we call A; dangerous. All 
uncolored k € A; are now considered saved. When saved points & are reached in the 
sequential coloring they are not colored but simply skipped over. At the conclusion 
of the First Pass points are red, blue or saved. We say a set A; survives if it does not 
have both red and blue points. Let S C G denote the (random) set of surviving sets. 


Claim 5.7.2 Almost surely all components C of G|s have size O(In N). 


Proof. An A; € S may be dangerous or, possibly, many of its points were saved 
because neighboring (in G) sets were dangerous. The probability of a particular A; 
becoming dangerous is at most 2!~”: since for this to occur the first n1 coin flips 
determining colors of 7 € A; must come up the same. (We only have inequality 
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since in addition n, points of A; must be reached before being saved.) Let V be an 
independent set in G; that is, the A; € V are mutually disjoint. Then the probability 
that all A; € V become dangerous is at most (2'~"")!” as the coin flips involve 
disjoint sets. Now let V C G be such that all distances between the A; € V are at 
least 4, distance being the length of the shortest path in G. We claim that 


Pr(V CS} < (d+ 1)Vi(qi-m yl, 


This is because for each A; € V there are at most d + 1 choices for a dangerous 
neighbor Ay, giving (d + 1)!”! choices for the Aj. As the A; are at least four apart 
the A, cannot be adjacent and so the probability that they are all dangerous is at most 
(2!~™)IV1, as claimed. 

Call T C G a 4-tree if the A; € T are such that all their mutual distances in G 
are at least four and so that, drawing an arc between A;, A; € T if their distance is 
precisely four, the resulting graph is connected. We first bound the number of 4-trees 
of size u. The “distance-four” graph defined on T' must contain a tree. There are 
less than 4? trees (up to isomorphism) on j vertices, now fix one. We can label the 
tree 1,...,uso that each 7 > 1 is adjacent to some 2 < 7. Now consider the number 
of (A',..., A“) whose distance-four graph corresponds to this tree. There are N 
choices for A’. Having chosen A? for all 1 < j the set A? must be at distance four 
from A’ in G and there are at most D such points. Hence the number of 4-trees of 
size u is at most 4" N D%—! < N(4D)". For any particular 4-tree T we have already 
that Pr [T C S] < [(d+1)2'~™]". Hence the expected number of 4-trees T C S is 
at most 

N [8D(d+1)2-™"]" . 


As the bracketed term is less than 1/2 by assumption, for u = c, In N this term is 
o(1). Thus almost surely G|g will contain no 4-tree of size bigger than c, InN. We 
actually want to bound the size of the components C’ of G|s. A maximal 4-tree T 
in a component C’ must have the property that every A; € C lies within three of an 
A; € T. There are less than d? (a constant) A; within three of any given A; so that 
c,InN > |T| => |C|d-3 and so (since d is a constant) 


|C| < ca In N ; 
proving the claim. i) 


If the First Pass leaves components of size larger than c2 ln N we simply repeat 
the entire procedure. In expected linear time the First Pass is successful. The points 
that are red or blue are now fixed. The sets A; with both red and blue points can now 
be ignored. For each surviving A; fix a subset B; of n — n, saved points. It now 
suffices to color the saved points so that no B; is monochromatic. B, are split into 
components of size O(In NV) and it suffices to color each component separately. On 
the Second Pass we apply the method of the First Pass to each component of the B;. 
Now we call a set B; dangerous if it receives nz points of one color and none of the 
other. The Second Pass takes expected time O(M) to color a component of size M, 
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hence an expected time O(N) to color all the components. (For success we require 
that a component of size M is broken into components of size at most cz ln M. To 
avoid trivialities, if M < InIn N we skip the Second Pass for the corresponding 
component.) At the end of the Second Pass (still in linear time!) there is a family of 
twice surviving sets C; C B; C A; of size n3, the largest component of which has 
size O(In In N). 

We still need to color these O(N’) components of sets of size n3, each component 
of size O(InIn N). By the Local Lemma (or directly by Theorem 5.2.1), each of 
these components can be two-colored. We now find the two-coloring by brute force! 
Examining all two-colorings of acomponent of size M takes time O (M 2M i: which 
is O ((In N)°) in our case. Doing this for all components takes time O (N(In N)°). 
This completes the coloring. | 


We note that with slightly more restrictions on n,d, a Third Pass could be made 
and then the total time would be O (NV (In .N)°). We note also that a similar technique 
can be applied for converting several other applications of the Local Lemma into 
efficient algorithms. 


5.8 EXERCISES 


1. (*) Prove that for every integer d > 1 there is a finite c(d) such that the edges 
of any bipartite graph with maximum degree d in which every cycle has at least 
c(d) edges can be colored by d+ 1 colors so that there are no two adjacent 
edges with the same color and there is no two-colored cycle. 


2. (*) Prove that for every « > 0 there is a finite J) = lg(¢) and an infinite 
sequence of bits a, @2,a3,..., where a; € {0,1}, such that for every | > Io 
and every i > 1 the two binary vectors u = (a;,@i41,---,@ii-1) and v = 
(@i40, 41415 -+-,@i+21-1) differ in at least (5 — €)l coordinates. 


3. Let G = (V, E) be a simple graph and suppose each v € V is associated with 
a set S(v) of colors of size at least 10d, where d > 1. Suppose, in addition, 
that for each v € V and c € S(v) there are at most d neighbors u of uv such 
that c lies in S(w). Prove that there is a proper coloring of G assigning to each 
vertex v a color from its class S(v). 


4. Let G = (V, E) bea cycle of length 4n and let V = Vj UV2 U--- UV, bea 
partition of its 4n vertices into n pairwise disjoint subsets, each of cardinality 
4. Is it true that there must be an independent set of G containing precisely 
one vertex from each V;? (Prove or supply a counter example.) 


5. (*) Prove that there is an absolute constant c > 0 such that for every k there is 
a set S;, of at least ck In k integers, such that for every coloring of the integers 
by k colors there is an integer x for which the set x + S does not intersect all 
color classes. 


THE PROBABILISTIC LENS: 
Directed Cycles 


Let D = (V, FE) bea simple directed graph with minimum outdegree 6 and maximum 
indegree A. 


Theorem 1 [Alon and Linial (1989)] /f e(Aéd + 1)(1 —1/k)® < 1 then D contains 
a (directed, simple) cycle of lengthO (mod k). 


Proof. Clearly we may assume that every outdegree is precisely 6, since otherwise 
we can consider a subgraph of D with this property. 

Let f : V — {0,1,...,k& — 1} be a random coloring of V, obtained by choosing, 
for each v € V,f(v) € {0,...,4 — 1} independently, according to a uniform 
distribution. For each v € V, let A, denote the event that there is no u € V, with 
(v,u) € E and f(u) = f(v) +1 (mod k). Clearly Pr[A,] = (1 — 1/k)*. One 
can easily check that each event A, is mutually independent of all the events A, but 
those satisfying 


Nt+(v) ({u} U Unt) #0, 


where here Nt+(v) = {w € V : (v,w) € FE}. The number of such w’s is at 
most A6é and hence, by our assumption and by the Local Lemma (Corollary 5.1.2), 
Pr [Avev Av] > 0. Thus there is an f : V — {0,1,...,4 — 1} such that for every 
uv € V there isau € V with 


(v,u)e€ EF and f(u)=f(v)+1 (mod k). (1) 


Starting at an arbitrary v = vo € V and applying (1) repeatedly we obtain a sequence 
U9, U1, V2,-.. Of vertices of D so that (v;,vi41) € E and f(vi4i1) = flu) +1 
(mod k) for all i > 0. Let 7 be the minimum integer so that there is an & < j with 
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vg = v;. The cycle veve41ve42°++U; = ve is a directed simple cycle of D whose 
length is divisible by k. a 


Correlation Inequalities 


You just keep right on thinking there, Butch, that’s what you’re good at. 
— Robert Redford to Paul Newman in Butch Cassidy and the Sundance Kid 


Let G = (V, £) be arandom graph on the set of vertices V = {1, 2,...,n} generated 
by choosing, for each i,7 € V,7 4 7 independently, the pair {7,7} to be an edge 
with probability p, where 0 < p < 1. Let H be the event that G is Hamiltonian and 
let P be the event that G is planar. Suppose one wants to compare the two quantities 
Pr[P A H] and Pr [P| - Pr |]. Intuitively, knowing that G' is Hamiltonian suggests 
that it has many edges and hence seems to indicate that G is less likely to be planar. 
Therefore it seems natural to expect that Pr [P | H] < Pr [P] implying 


Pr [PA.A| < Pr [A] + Pr[P|: 


This inequality, which is, indeed, correct, is a special case of the FKG inequality 
of Fortuin, Kasteleyn and Ginibre (1971). In this chapter we present the proof of 
this inequality and several related results, which deal with the correlation between 
certain events in probability spaces. The proofs of all these results are rather simple, 
and still they supply many interesting consequences. The first inequality of this type 
is due to Harris (1960). A result closer to the ones considered here is a lemma of 
Kleitman (1966a), stating that if A and 6 are two monotone decreasing families of 
subsets of {1,2,...,n} (ie., Ae Aand A’ C A => A’ © A and, similarly B € B 
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and B’ C B= B’ € B) then 
|AN B]-2” > |Al- |B]. 


This lemma was followed by many extensions and generalizations until Ahlswede 
and Daykin (1978) obtained a very general result, which implies all these extensions. 
In the next section we present this result and its proof. Some of its many applications 
are discussed in the rest of the chapter. 


6.1 THE FOUR FUNCTIONS THEOREM OF AHLSWEDE AND DAYKIN 


Suppose n > 1 and put N = {1,2,...,n}. Let P(N) denote the set of all subsets 
of N, and let R* denote the set of nonnegative real numbers. For a function 
yg: P(N) — R* and for a family A of subsets of N denote y(A) = So yc4 (A). 
For two families A and B of subsets of N define AUB={AUB:AeE A,B eB} 
and ANB={ANB:AEA, Be B}. 


Theorem 6.1.1 [The Four Functions Theorem] Let a, 3, 7,5 : P(N) —- Rt be 
four functions from the set of all subsets of N to the nonnegative reals. If, for every 
‘two subsets A, B C N the inequality 


@(A)B(B) < (AU B)8(AN B) 6.1) 
holds, then, for every two families of subsets A,B C P(N), 


a(A)B(B) < 7(AUB)6(ANB). (6.2) 


Proof. Observe, first, that we may modify the four functions a, 3,7, 6 by defining 
a(A) = 0 forall A ZA, G(B) = 0 forall B ¢ B, y(C) = 0 forall C ¢ AUB, and 
6(D) = 0 for all D ¢ AM B. Clearly (6.1) still holds for the modified functions and 
in inequality (6.2) we may assume now that A= B= AUB=ANB= P(N). 

To prove this inequality we apply induction on n. The only step that requires 
some computation is n = 1. In this case P(N) = {0,N}. For each function 
yp € {a, 8,7, 6} define yo = y(B) and vy; = y(N). By (6.1) we have 


ao8o < odo, 
ahi < 40; 
apo < nd, 
mah < nd. (6.3) 


By the above paragraph we only have to prove inequality (6.2), where A = B = 
P(N), that is, to prove that 


(ao + @1)(80 + 81) < (Yo +71) (50 + 41) - (6.4) 
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If either y, = 0 or 69 = 0 this follows immediately from (6.3). Otherwise, by (6.3), 
Yo > a08o/59 and 6, > a1 3, /71. It thus suffices to show that 


(0 +n) (5 . auf) > (ao + a1)(Bo +61), 


or, equivalently, that 


(a080 + 7160) (4071 + 0181) > (a0 + 21) (Bo + 81) 6071 - 


The last inequality is equivalent to 


(7150 — @081)(7140 — 0180) > 0, 


which follows from (6.3), as both factors on the left-hand side are nonnegative. This 
completes the proof for n = 1. 

Suppose, now, that the theorem holds for n ~ 1 and let us prove it for n (> 2). 
Put N’ = N \ {n} and define for each yp € {a, 8, y, 6} and each A C in a a 
y(A) + y(A U {n}). Clearly, for each function y € {a, 8,7, 6}, y/(P(N’)) = 
y(P(N)). Therefore the desired inequality (6.3) would follow from ie the 
induction hypothesis to the functions a’, 6’, y',6’ : P(N’) — Rt. However, in 
order to apply this hypothesis we have to check that these new functions satisfy the 
assumption of Theorem 6.1.1 on N’; that is, that for every A’, B’ CN’, 


a!(A')8"(B’) < 7'(A' U B’)5'(A' 0B’). (6.5) 


Not surprisingly, this last inequality follows easily from the case n = 1, which we 
have already proved. Indeed, let T be a 1-element set and define 


a(0) = a(4’), a(T) = a(A'U {n}), 

BO) = B(B'), B(T) = B(B' U{n}), 

7(0) = (AU B’) UT) = V(A'U BYU {n}), 
6(0) = 6(4'N BY), 0(T) = 6((A'N B’) U {n}). 


By the assumption (6.1), @(S)G(R) < 7(S U R)d(S 2 R) for all S,;R C T and 
hence, by the case n = 1 already proved, 


a’(A’)6'(B’) = &(P(T))B(P(T)) < W(P(T))4(P(T)) = 7 (AUB) 5 (A'NB’), 


which is the desired inequality (6.5). Therefore inequality (6.2) holds, completing 
the proof. ct 


The Ahlswede—Daykin Theorem can be extended to arbitrary finite distributive 
lattices. A lattice is a partially ordered set in which every two elements, x and y, 
have a unique minimal upper bound, denoted by z V y and called the join of x and y 
and a unique maximal lower bound, denoted by x A y and called the meet of x and y. 
A lattice L is distributive if for all x, y, z € L, 


rA(yVz)=(2@Ay)V (aAz) 
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or, equivalently if for all x, y, z € L, 
uV(yAz)=(r@Vy)A(eV2z). 

For two sets X,Y C L define 
XVY={aVy:cEx,yeY} 


and 
XAY={xAy:r2EX,yEY}. 


Any subset L of P(N), where N = {1,2,...,n}, ordered by inclusion, which is 
closed under the union and intersection operations is a distributive lattice. Here, the 
join of two members A, B € L is simply their union A U B and their meet is the 
intersection AM B. It is somewhat more surprising (but easy to check) that every 
finite distributive lattice L is isomorphic to a sublattice of P({1,2,...,n}) for some 
n. [To see this, call an element x € L join-irreducible if whenever x = y V z 
then either z = y or x = z. Let £1,%2,...,2y be the set of all join-irreducible 
elements in L and associate each element x € L with the set A = A(x) C N, where 
v= Vieq4 2 and {x; : 1 € A} are all the join-irreducibles y satisfying y < x. The 
mapping « — A(z) is the desired isomorphism.] This fact enables us to generalize 
Theorem 6.1.1 to arbitrary finite distributive lattices as follows. 


Corollary 6.1.2 Let L be a finite distributive lattice and let a, 3, and 6 be four 
functions from L to R*. If 


a(x) By) < y(aV y)d(z Ay) 
forall x,y © L then for every X,Y CL, 
o(X)B(Y) < 1X VY)6(X AY). 


The simplest case in the last corollary is the case where all the four functions a, 
B, 7 and 6 are identically 1, stated below. 


Corollary 6.1.3 Let L be a finite distributive lattice and suppose X,Y C L. Then 
IX|-|Y¥|<|XVY|-|X AY]. 


We close this section by presenting a very simple consequence of the last corollary, 
first proved by Marica and Schonheim (1969). 


Corollary 6.1.4 Let A be a family of subsets of a finite set N and define 
A\ A={F\ F: FF’ e€ A}. 
Then |A \ A| > |Al. 
Proof. Let L be the distributive lattice of all subsets of N. By applying Corollary 6.1.3 
toAandB={N\F: F © A} we obtain 
|A? = |Al - |B] < |AUB]-|ANB| =|A\ AP. 


The desired result follows. a 
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6.2 THE FKG INEQUALITY 


A function » : L — R*, where L is a finite distributive lattice, is called log- 
supermodular if 


M(x)u(y) < we V y)u(a Ay) 


for all z,y € L. A function f : L — R* is increasing if f(x) < f(y) whenever 
x < yand is decreasing if f(x) > f(y) whenever x < y. 

Motivated by a problem from statistical mechanics, Fortuin et al. (1971) proved 
the following useful inequality, which has become known as the FKG Inequality. 


Theorem 6.2.1 [The FKG Inequality] Let L be a finite distributive lattice and let 
pu: L > R* be a log-supermodular function. Then, for any two increasing functions 
f,9:L—R* we have 


(= uote) i Ha4e)) < (sa x) f(x)g(x ) (x pa) ). 


zeL zeL zeL rel 


Proof. Define four functions a, 8,7, : L — R* as follows. For each z € L, 


a(a) = w(x) f(x), Bla) = w(a)g(a) , 
V(x) = v2) f(2)9(@) 6(z) = w(z). 


We claim that these functions satisfy the hypothesis of the Ahlswede—Daykin Theo- 
rem, stated in Corollary 6.1.2. Indeed, if x,y € L then, by the supermodularity of jz 
and since f and g are increasing, 


a(z)By) = wx) f@)ulyay) S Hav Wf (2)o(y)ula Ay) 
S wavy)f@vy)g(ev yea Ay) = (eV yey). 
Therefore by Corollary 6.1.2 (with X = Y = L), 
a(L)B(L) < y(L)6(L), 


which is the desired result. & 


Note that the conclusion of Theorem 6.2.1 holds also if both f and g are decreasing 
(simply interchange 7 and 6 in the proof). In case f is increasing and g is decreasing 
(or vice versa) the opposite inequality holds: 


(3: wevse)) (3: neat )) » (Somer f(x)g(z )) (Sat ). 


To prove it, simply apply Theorem 6.2.1 to the two increasing functions f(a) and 
k ~ g(x), where k is the constant maxz¢, g(x). [This constant is needed to guarantee 
that k — g(x) > O forall z € L.] 
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It is helpful to view 4: as a measure on L. Assuming yp is not identically zero we 
can define, for any function f : L — R*, its expectation, 


DVzer f(t) u(2) 
er u(x) ; 


With this notation, the FKG Inequality asserts that if 4 is log-supermodular and 
f.,g: L - R* are both increasing or both decreasing then 


(f9) = (Ff) (9) - 


Similarly, if f is increasing and g is decreasing (or vice versa), then 


(f9) < (f) (g) - 


This formulation demonstrates clearly the probabilistic nature of the inequality, some 
of whose many interesting consequences are presented in the rest of this chapter. 


i= 


6.3 MONOTONE PROPERTIES 


Recall that a family A of subsets of N = {1,2,...,n} is monotone decreasing if 
A€ Aand A’ C A= A’ EA. Similarly, it is monotone increasing if A € A and 
AC A’= A’€ A. By considering the power set P(V) as a symmetric probability 
space, one naturally defines the probability of A by 


priaj= 


Thus Pr [.A] is simply the probability that a randomly chosen subset of N lies in A. 
Kleitman’s Lemma, which was the starting point of all the correlation inequalities 


considered in this chapter, is the following. 


Proposition 6.3.1 Let A and B be two monotone increasing families of subsets of 


N = {1,2,...,n} and let C and D be two monotone decreasing families of subsets 
of N. Then 

Pr[ANB] > Pr{[A]-Pr[Bb], 

Pen) = Pri) PriD)... 

Pr[ ANC] < Pr|A]-Pr[C]. 


In terms of cardinalities, this can be read as follows: 


2"|ANB, > |Al- |B), 
PCN D) > lel(Dk, 
2ZIANC| < |Al- IC, 
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where here and in what follows, 4M B,C MD and ANC denote usual intersections 
of families. 


Proof. Let f : P(N) — R°* be the characteristic function of A; that is, f(A) = 0 if 
A ¢ Aand f(A) =1if A € A. Similarly, let g be the characteristic function of B. 
By the assumptions, f and g are both increasing. Applying the FKG Inequality with 
the trivial measure j: = 1 we get 


Pr[AN Bl = (fg) = (Ff) (9) = Pr[A] - Pr[B] . 


The other two inequalities follow similarly from Theorem 6.2.1 and the paragraph 
following it. 

It is worth noting that this proposition can be also derived easily from the 
Ahlswede—Daykin Theorem or from Corollary 6.1.3. | 


The last proposition has several interesting combinatorial consequences, some of 
which appear already in Kleitman’s original paper. Since those are direct combina- 
torial consequences and do not contain any additional probabilistic ideas, we omit 
their exact statement and turn to a version of Proposition 6.3.1 in a more general 
probability space. 

For a real vector p = (pi,.-.,Pn), where 0 < p; < 1, consider the probability 
space whose elements are all members of the power set P(N), where, for each 
ACN, Pr{A] = [ea Pil] j¢4(1 — pj). Clearly this probability distribution 
is obtained if we choose a random A C N by choosing each element i € N, 
independently, with probability p;. Let us denote, foreach A C P(N), its probability 
in this space by Pry, [A]. In particular, if all the probabilities p; are 1/2 then Pr, [A] 
is the quantity denoted as Pr [A] in Proposition 6.3.1. Define pp = ppp : P(N) — Rt 
by H(A) = Tica Pi Tea C0 ~ Di)- 

It is easy to check that yz is log-supermodular. This is because for A,B C N, 
u(A) p(B) = p(AU B)u(AN B), as can be checked by comparing the contribution 
arising from each i € N to the left-hand side and to the right-hand side of the 
last equality. Hence one can apply the FKG Inequality and obtain the following 
generalization of Proposition 6.3.1. 


Theorem 6.3.2 Let A and B be two monotone increasing families of subsets of N 
and let C and D be two monotone decreasing families of subsets of N. Then, for any 
real vector p = (p1,---;Pn)0< pi <1, 


Pr,{[ANB] > Pry [A] -Prp [8] , 
Pre lod) Pry \Ce Pres P45 
Prp[ANC] < Prp[Aj- Pr, [C] . 
This theorem can be applied in many cases and will be used in Chapter 8 to derive 


the Janson Inequalities. As a simple illustration suppose that A;, Ao,..., A, are 
arbitrary subsets of N’ and one chooses a random subset A of N by choosing each 
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i € N, independently, with probability p. Then, Theorem 6.3.2 easily implies that 


k 
Pr [A intersects each Aj] > I] Pr (A intersects Aj] . 
i=1 


Note that this is false, in general, for other similar probabilistic models. For example, 
if A is arandomly chosen é-element subset of N then the last inequality may fail. 

By viewing the members of N as the n = eed) edges of the complete graph on the 
set of vertices V = {1,2,...,m}, we can derive a correlation inequality for random 
graphs. Let G = (V, &) be a random graph on the set of vertices V generated by 
choosing, for each i, 7 € V,i # 7, independently, the pair {i, 7} to be an edge with 
probability p. (This model of random graphs is discussed in detail in Chapter 10.) A 
property of graphs is a subset of the set of all graphs on V, closed under isomorphism. 
Thus, for example, connectivity is a property (corresponding to all connected graphs 
on V) and planarity is another property. A property @ is monotone increasing if 
whenever G has Q and H is obtained from G by adding edges then H has Q too. 
A monotone decreasing property is defined in a similar manner. By interpreting the 
members of N in Theorem 6.3.2 as the (%) pairs {i, j} with i, 7 € V,i A j we obtain 
the following. 


Theorem 6.3.3 Let Q1,Q2,Q3 and Q4 be graph properties, where Q,,Q2 are 
monotone increasing and Q3,Qa4 are monotone decreasing. Let G = (V,E) bea 
random graph on V obtained by picking every edge, independently, with probability 
p. Then 


Pr[(GEQiNQ] > Pr[(G€Q.)-Pr[GeQ.], 
Pr[G€Q3NQ4] > Pr[G€Qs]-Pr[G Ee Qu], 
Pr[(G €QiNQ3] < Pr[G € Qi]-Pr[G Ee Q3] . 


Thus, for example, the probability that G is both Hamiltonian and planar does not 
exceed the product of the probability that it is Hamiltonian by that it is planar. It 
seems hopeless to try and prove such a statement directly, without using one of the 
correlation inequalities. 


6.4 LINEAR EXTENSIONS OF PARTIALLY ORDERED SETS 


Let (P,<) be a partially ordered set with n elements. A linear extension of P is 
a one to one mapping o : P — {1,2,...,n}, which is order preserving; that is, if 
z,y € Pandz < ytheno(x) < o(y). Intuitively, o is a ranking of the elements of 
P that preserves the partial order of P. Consider the probability space of all linear 
extensions of P, where each possible extension is equally likely. In this space we can 
consider events of the form, for example, z < yor (a < y)A(x < z) (forz,y,z € P) 
and compute their probabilities. It turns out that the FKG Inequality is a very useful 
tool for studying the correlation between such events. The best known result of this 
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form was conjectured by Rival and Sands and proved by Shepp (1982). [See also 
Fishburn (1992) for a strengthening.] It asserts that for any partially ordered set P 
and any three elements x,y,z € P: Pr[x <yAa<2z|>Prlx<y|Prlz < 2]. 

This result became known as the XY Z Theorem. Although it looks intuitively 
obvious, its proof is nontrivial and contains a clever application of the FKG Inequality. 
In this section we present this result and its elegant proof. 


Theorem 6.4.1 Let P be a partially ordered set with n elements a,,Q2,...,Qn. Then 


Pr [a) < ag Aa, < a3] > Pr[ay < ae] Prlai < az] . 


Proof. Let m be a large integer (which will later tend to infinity) and let L be 
the set of all ordered n-tuples x = (21,...,@p), where x; € M = {1,2,...,m}. 
(Note that we do not assume that the numbers x; are distinct.) Define an order 
relation < on L as follows. For y = (y,.--,Yn) € LE and x as above x < y 
iff 7, > y, and x; — 2, < y; — y for all 2 < 7 < n. It is not too difficult 
to check that (Z, <) is a lattice in which the ith component of the meet x A y is 
(xAy); = min{x; — 27, y; — yi} + max{2x1, y:} and the ith component of the join 
xVyis(xVy); = max{z; —271,y; — y:} + min{z, y:}. 

Moreover, the lattice L is distributive. This follows by an easy computation from 
the fact that the trivial lattice of integers (with respect to the usual order) is distributive 
and hence for any three integers a, b and c, 


min{a, max{b,c}} = max {min{a, b}, min{a,c}} , (6.7) 
and 
max{a, min{b,c}} = min {max{a, b}, max{a,c}} . (6.8) 
Let us show how this implies that L is distributive. Let x = (a1,...,2n), 
y = (y1,---,Yn) and z = (z1,..., Zn) be three elements of L. We must show that 


xA(y Vz) =(xAy)V(xAz). 
The ith component of x A (y V z) is 
xA(yVz))i = min{a;—21,(y Vz); -—(y V2)i} 
+max{x1,(y Vz)i} 


min {z; — 21, max{y; — y1, 2% — 2}} 
+max{zx,,min{y1, z1}}. 


i 


Similarly, the ith component of (x A y) V (x A 2) is 


((xAy) V (xA2)): 
max {(x Ay); — (XA y)i, (x Az)i — (kK Az)i} 
+min{(x A y)1,(xAz)1} 


max {min{x; — 71, y; — yi}, min{x; — 21,2; — z}} 


II 


+ min {max{z,, y:}, max{z1, zi}} . 
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These two quantities are equal, as follows by applying (6.7) with a = x; — 24, 
b=yi- Yc = % — 2 and (6.8) witha =2,,b=y,c= 2%. 

Thus L is distributive. To apply the FKG Inequality we need the measure function 
p and the two functions f and g. Let ys be the characteristic function of P; that 
is, for x = (a%1,...,2n) € L, u(x) = lif x; < x; whenever a; < a; in P and 
u(x) = 0 otherwise. To show that py is log-supermodular it suffices to check that if 
u(x) = u(y) = 1 then p(x Vy) = w(x Ay) = 1. However, if u(x) = u(y) = 1 
and a; <a; in P then x; <x; and y; < y; and hence 


(cVy)i = max{z;-—21,y —yi}+min{z,y1} 
max{z; — 21,4; — yi} +min{z,y:} = (eV y);, 


lA 


that is, w(x V y) = 1. Similarly, (x) = u(y) = 1 implies u(x A y) = 1 too. 

Not surprisingly, we define the functions f and g as the characteristic functions of 
the two events x1 < x2 and x; < x3, respectively; that is, f(x) = lif x, < x2 and 
f(x) = Ootherwise, and g(x) = lif, < x3 and g(x) = O otherwise. Trivially, both 
f and g are increasing. Indeed, if x < y and f(x) = 1 then0 < 22-271 <yo-1 
and hence f(y) = 1, and similarly for g. 

We therefore have all the necessary ingredients for applying the FKG Inequality 
(Theorem 6.2.1). This gives that in D the probability that an n-tuple (271,...,2n) 
that satisfies the inequalities in P satisfies both x; < x2 and x, < 23 is at least as big 
as the product of the probability that it satisfies x; < x2 by that it satisfies 71 < 23. 
Note that this is not yet what we wanted to prove; the n-tuples in DE are not n-tuples 
of distinct integers and thus do not correspond to linear extensions of P. However, as 
m — oo, the probability that x; = x; for some i 4 j ina member x = (#,...,2n) 
of L tends to 0 and the assertion of the theorem follows. | 


6.5 EXERCISES 


1. Let G be a graph and let P denote the probability that a random subgraph of 
G obtained by picking each edge of G with probability 1/2, independently, 
is connected (and spanning). Let @ denote the probability that in a random 
two-coloring of G, where each edge is chosen, randomly and independently, 
to be either red or blue, the red graph and the blue graph are both connected 
(and spanning). Is Q < P?? 


2. A family of subsets G is called intersecting if G; NG2 # 0 for all G,, Gz € G. 
Let F,F2,..., Fx be k intersecting families of subsets of {1,2,...,n}. Prove 


that 
k 


Us 


4=1 


Oy ee ae 


3. Show that the probability that in the random graph G(2k, 1/2) the maximum 
degree is at most k — 1 is at least 1/4*. 


THE PROBABILISTIC LENS: 
Turdn’s Theorem 


In a graph G = (V, E) let d, denote the degree of a vertex v and let a(G) be the 
maximal size of an independent set of vertices. The following result was proved by 
Caro and Wei. 


Theorem 1 a(G) > > 
vEeVv 


dy + 1° 


Proof. Let < be a uniformly chosen total ordering of V. Define 
IT={veV:{v,wejeES>vu<v}. 
Let X, be the indicator random variable for v € J and X = SO cy Xy = |J|. For 


each v, 
1 


dy +1? 
since v € I if and only if v is the least element among v and its neighbors. Hence 


EM = a5 


E(X,] = Priv € J] = 


and so there exists a specific ordering < with 
"ee 
7 vEV dy +1 


But if z,y € J and {x,y} € E thenaz < yandy < x, acontradiction. Thus J is 
independent and a(G) > |I|. a 
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For any m < n let q,r satisfy n = mq+7r,0 <r < m, and lete = r(¢t?) + 
(m — r)(3). Define a graph G = Gy,¢ on n vertices and e edges by splitting the 
vertex set into m classes as evenly as possible and joining two vertices if and only if 
they lie in the same class. Clearly a(Gnje) = m. 


Theorem 2 [Turan (1941)] Let H have n vertices and e edges. Then a(H) >m 
anda(H)=m<eH Gre. 


Proof. G,,,- has )>,<y (dy + 1)~' =m since each clique contributes 1 to the sum. 
Fixing e = Do ey dy/2, cy (dy +1)7' is minimized with the d, as close together 
as possible. Thus for any H, 


1 
a(H) > ; >m. 
veV dy +1 


For a(H) = m we must have equality on both sides above. The second equality 
implies the d, must be as close together as possible. Letting X = ({/| as in the 
previous theorem, assume a(H) = E[X]. But a(H) > X for all values of < so X 
must be a constant. Suppose #7 is not a union of cliques. Then, there exist z, y, z € V 
with {x,y}, {z,z} © E,{y,z} ¢@ E. Let < be an ordering that begins x, y, z and 
<’ the same ordering except that it begins y, z, x, and let J, I’ be the corresponding 
sets of vertices all of whose neighbors are “greater”’ Then J, I’ are identical except 
that x € I,y,z ¢ I, whereas x ¢ I’,y,z © I’. Thus_X is not constant. That is, 
a(H) = E[X] implies that H is the union of cliques and so H = Ge. a 


Martingales and Tight 
Concentration 


Mathematics seems much more real to me than business — in the sense that, well, 
what’s the reality in a McDonald’s stand? It’s here today and gone tomorrow. 
Now, the integers — that’s reality. When you prove a theorem, you’ve really done 
something that has substance to it, to which no business venture can compare 
for reality. 


— Jim Simons 


7.1 DEFINITIONS 


A martingale is a sequence Xo,..., Xm of random variables so that forO <i<m, 
Ee Xp | Reg s , Xo] = Xj. 


Imagine a gambler walking into a casino with Xo dollars. The casino contains a 
variety of games of chance. All games are “fair” in that their expectations are zero. 
The gambler may allow previous history to determine his choice of game and bet. 
He might employ the gambler’s definition of martingale — double the bet until you 
win. He might play roulette until he wins three times and then switch to keno. Let 
X; be the gambler’s fortune at time 7. Given that X; = a the conditional expectation 
of X;41 must be a and so this is a martingale. 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
Copyright ©) 2008 John Wiley & Sons, Inc. 


97 


98 MARTINGALES AND TIGHT CONCENTRATION 


A simple but instructive martingale occurs when the gambler plays “flip a coin” 
for stakes of one dollar each time. Let Y;,..., Y;, be independent coin flips, each 
+1 or —1 with probability 1/2. Normalize so that Xo = 0 is the gambler’s initial 
stake, though he has unlimited credit. Then X; = Y, +---+ Yj; has distribution S;. 

Our martingales will look quite different, at least from the outside. 


The Edge Exposure Martingale. Let the random graph G(n, p) be the underlying 
probability space. Label the potential edges {7,7} C [n] by e1,...,em, setting 
m= (5) for convenience, in any specific manner. Let f be any graph theoretic 
function. We define a martingale Xo,..., Xm by giving the values X;(H). X,(H) 
is simply f(H). Xo(H) is the expected value of f(G) with G ~ G(n, p). Note that 
Xo is aconstant. In general (including the cases i = 0 andi = m), 


X;(H) =E[f(G) |e; €E6 ee; EC HL<j <i]. 


In words, to find X;() we first expose the first ¢ pairs e1,..., €; and see if they are 
in H, The remaining edges are not seen and considered to be random. X;(H) is 
then the conditional expectation of f(G) with this partial information. When i = 0 
nothing is exposed and Xo is a constant. When 7 = m all is exposed and X,, is the 
function f. The martingale moves from no information to full information in small 
steps. 


Pas al ea 


X3 
X2 


xy 


Xo 


Fig. 7.1 The edge exposure martingale with n = m = 3, f is the chromatic number and 
the edges exposed in the order “bottom, left, right.” The values X;(H) are given by tracing 
from the central node to the leaf labeled H. 


The figure shows why this is a martingale. The conditional expectation of f(H) 
knowing the first 1 — 1 edges is the weighted average of the conditional expectations of 
f(H), where the ith edge has been exposed. More generally — in what is sometimes 
referred to as a Doob martingale process — X; may be the conditional expectation of 
f(A) after certain information is revealed as long as the information known at time 
i includes the information known at time 7 — 1. 
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The Vertex Exposure Martingale. Again let G(n, p) be the underlying probability 
space and f any graph theoretic function. Define _X1,..., Xn by 


X;(H) =E[f(G) | forz,y <i,{z,y} eGo {x,y} eA] . 


In words, to find X;(H) we expose the first i vertices and all their internal edges 
and take the conditional expectation of f(G) with that partial information. By 
ordering the edges appropriately the vertex exposure martingale may be considered 
a subsequence of the edge exposure martingale. Note that X,(H) = E[f(G)] is 
constant as no edges have been exposed and X,,(H) = f(H) as all edges have been 
exposed. 


7.2 LARGE DEVIATIONS 


Maurey (1979) applied a large deviation inequality for martingales to prove an isoperi- 
metric inequality for the symmetric group S,,. This inequality was useful in the study 
of normed spaces; see Milman and Schechtman (1986) for many related results. 
The applications of martingales in graph theory also all involve the same underlying 
martingale result used by Maurey, which is the following. 


Theorem 7.2.1 [Azuma’s Inequality] Let 0 = Xo,..., Xm be a martingale with 
|Xina — Xi] <1 
forallO <i<m. Let X > 0 be arbitrary. Then 
Pr [Xin > AVm] <e7/?. 


In the “flip a coin” martingale X,, has distribution S,, and this result is Theo- 
rem A.1.1. Indeed, the general proof is quite similar. 


Proof. Set, with foresight, a = A/,/m. Set Y; = X; — X;~1 so that |Y;| < 1 and 
E [Y; | Xi-1, Xi-2,-.., Xo] = 0. Then, as in Theorem A.1.16, 


E [e” Xj~-1, Xj-2,--- , Xo] < cosh(a) < ee /2 


Hence 


E fen = E i a 
i=1 
m—-1 
E (TI #6) Eee? | Nye eas | 
i=l 


m—-1 
E TI on er /2 < ex m/2 . 
i=l 


IA 
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Therefore 
Pr[Xm>AVm] = Pr lene S eve) 
< EletXm] evn 
< et m/2-advim _ 9-r*/2 
as needed. = 


Corollary 7.2.2 Let c = Xo,..., Xm be a martingale with 
[Xigay =X ST 
forall0 <i<m. Then 
Pr [|Xm — cl > Am] < 2e7*/?, 


A graph theoretic function f is said to satisfy the edge Lipschitz condition if 
whenever H and H’ differ in only one edge then |f(H) — f(H’)| < 1. It satisfies 
the vertex Lipschitz condition if whenever H and H’ differ at only one vertex, 


If(H) — f(H)| <1. 


Theorem 7.2.3. When f satisfies the edge Lipschitz condition the corresponding edge 
exposure martingale satisfies |X;41 —X;| <1. When f satisfies the vertex Lipschitz 
condition the corresponding vertex exposure martingale satisfies |Xj41 — X;| <1. 


We prove these results in a more general context later. They have the intuitive 
sense that if knowledge of a particular vertex or edge cannot change f by more than 
one then exposing a vertex or edge should not change the expectation of f by more 
than one. Now we give a simple application of these results. 


Theorem 7.2.4 [Shamir and Spencer (1987)] Let n,p be arbitrary and let c = 
E[x(G)], where G ~ G(n,p). Then 


Pr [|x(G) — e| > AVn — 1] < ger (2, 


Proof. Consider the vertex exposure martingale X,..., X, on G(n, p) with f(G) = 
x(G). A single vertex can always be given a new color so the vertex Lipschitz 
condition applies. Now apply Azuma’s Inequality in the form of Corollary 7.2.2. 


Letting \ — oo arbitrarily slowly, this result shows that the distribution of y(G) 
is “tightly concentrated” around its mean. The proof gives no clue as to where the 
mean is. 
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7.3. CHROMATIC NUMBER 


In Theorem 10.3.1 we prove that y(G) ~ n/2log.n almost surely, where G ~ 
G(n, 4). Here we give the original proof of Béla Bollobds using martingales. We 


follow the notations of Section 10.3, setting f(k) = (2-0), ko so that f(kg ~1) > 
1 > f(ko), k = ko —4 so that k ~ 2logy nand f(k) > n3+°), Our goal is to show 
Prlw(G)<kl)=e"" 

where w(G) is the size of the maximum clique of G. We shall actually show in 
Theorem 7.3.2 a more precise bound. The remainder of the argument is given in 
Section 10.3. 

Let Y = Y(#) be the maximal size of a family of edge disjoint cliques of size k 
in H. This ingenious and unusual choice of function is key to the martingale proof. 


Lemma 7.3.1 E[Y] > (1 + 0(1))(n?/2k*). 


Proof. Let K denote the family of k-cliques of G so that f(k) = 4 = E||K|]. Let W 
denote the number of unordered pairs { A, B} of k-cliques of G with 2 < |ANB| < k. 
Then E{[W] = A/2, with A as described in Section 10.3 (see also Section 4.5), 
A ~ p?k*n~?,. Let C be a random subfamily of K defined by setting, for each 
AéK, Pr[A €C] = q, ¢ to be determined. Let W’ be the number of unordered 
pairs {A, B}, A,B € C with2 < |AN B| < k. Then 


E[W’] = E[W]q = Aq’/2. 


Delete from C one set from each such pair {A,B}. This yields a set C* of edge 
disjoint k-cliques of G and 


E[Y] 2 E[|c*|] = E[|cl] — E[W’] = ug — Aq?/2 = p?/2A ~ n?/2k*, 
where we choose g = 4/A (< 1) to minimize the quadratic. a 
We conjecture that Lemma 7.3.1 may be improved to E[Y] > cn?/k?. That is, 


with positive probability there is a family of k-cliques that are edge disjoint and cover 
a positive proportion of the edges. 


Theorem 7.3.2 
Pr [w(G) < k] < eto) )(m?/ In? n) 


with c a positive constant. 


Proof. Let Yo,...,Ym,m = (5), be the edge exposure martingale on G(n, $) with 
the function Y just defined. The function Y satisfies the edge Lipschitz condition 
as adding a single edge can only add at most one clique to a family of edge disjoint 


cliques. (Note that the Lipschitz condition would not be satisfied for the number 
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of k-cliques as a single edge might yield many new cliques.) G has no k-clique 
if and only if Y = 0. Apply Azuma’s Inequality with m = (5) ~ n?/2 and 
E[Y] > (1 4+ 0(1))(n?/2k*). Then 
Pr[w(G) <k] = Pr[Y =0] 
< PrlY —E[Y] < -E[Y]] 
< e BIYI?/2(2) 
< en fe to(1))n?/kE 


e7 (e+0(1))n?/ In? n ; 
as desired. a 


Here is another example where the martingale approach requires an inventive 
choice of graph theoretic function. 


Theorem 7.3.3 Let p = n~°, where a > 3 is fixed. Let G = G(n,p). Then there 
exists u = u(n, p) so that almost always 


u<x(G)<ut3. 
That is, x(G) is concentrated in four values. 


We first require a technical lemma that has been well known. 


Lemma 7.3.4 Let a,c be fixed, a > 2, Let p = n-%. Then almost always every 
c/n vertices of G = G(n, p) may be three-colored. 


Proof. If not, let T be a minimal set that is not three-colorable. As T — {x} is 
three-colorable, « must have internal degree at least 3 in T for all ¢ € T. Thus if T 
has t vertices it must have at least 3t/2 edges. The probability of this occurring for 
some T' with at most c/n vertices is bounded from above by 


c/n 
3 (")( (3) pn. 
= t) \3t/2 
n (ney! ()) < (ie\"” 
(") s ( i ) ae 6 = (5 
so each term is at most 


Repteale 
(= 33/2 


We bound 


t 
cn 2 (crn!-S0/2y1/2) < (con'-S2/2n¥/4), = (con=*)' 


with € = 3a oe $ > 0 and the sum is therefore o{1). | 
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Proof [Theorem 7.3.3]. Let « > 0 be arbitrarily small and let u = u(n, p, €) be the 
least integer so that 
Pr[y(G) <u] >e. 


Now define Y(G) to be the minimal size of a set of vertices S for which G — S may 
be u-colored. This Y satisfies the vertex Lipschitz condition since at worst one could 
add a vertex to S. Apply the vertex exposure martingale on G(n, p) to Y. Letting 
“w= EY], 


Pr[Y <p-AVn—-1 < eV /2 
Pr[¥ >p+Avn-1 a pl 


Let X satisfy e~*’/2 = € so that these tail events each have probability less than e. 
We defined wu so that, with probability at least «, G would be u-colorable and hence 
Y = 0. That is, Pr{Y = 0] > ¢. The first inequality therefore forces 4 < A,/n — 1. 
Now employing the second inequality, 


Pr [Y > 2AVvn—1) < Pr[¥ >p+AVn-]1] <e. 


With probability at least 1 — ¢ there is a u-coloring of all but at most c’ \/n vertices. 
By the lemma almost always, and so with probability at least 1 — e, these points may 
be colored with three further colors, giving a (u + 3)-coloring of G. The minimality 
of u guarantees that with probability at least 1 — € at least u colors are needed for G. 
Altogether 

Prlu < y(G) <ut+3] >1-— 36, 


and ¢€ was arbitrarily small. | 


Using the same technique, similar results can be achieved for other values of a. 
Together with some related ideas it can be shown that for any fixed a > 3, y(G) is 
concentrated on at most two values. See Luczak (1991) and Alon and Krivelevich 
(1997) for the detailed proofs. 


7.4 TWO GENERAL SETTINGS 


The martingales useful in studying random graphs generally can be placed in the 
following general setting, which is essentially the one considered in Maurey (1979) 
and in Milman and Schechtman (1986). Let Q = A® denote the set of functions 
g: B — A. (With B the set of pairs of vertices on n vertices and A = {0,1} we 
may identify g € A® with a graph on n vertices.) We define a measure by giving 
values pay and setting 

Pr [g(b) = a] = Pav 


with the values g(b) assumed mutually independent. [In G(n, p) all p14 = p, Pop = 
1 — p.] Now fix a gradation 


0@=Boc Bi c::--cB, =B. 
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Let L : AP —+ R be a functional (e.g., clique number). We define a martingale 
Xo, X1,...,Xm by setting 


X;(h) = E [L(g) | g(b) = h(b) for all b € Bi] . 


Xo is a constant, the expected value of L of the random g. Xy, is L itself. The values 
X;(g) approach L(g) as the values of g(b) are “exposed.” We say the functional L 
satisfies the Lipschitz condition relative to the gradation if for allO <i <m, 


h,h’ differ only on Bj, — B; > IL(h') — L(h)| <1. 


Theorem 7.4.1 Let L satisfy the Lipschitz condition. Then the corresponding mar- 
tingale satisfies 
|Xiti(h) — Xi(h)| <1 


foral0<i<mhe A®. 


Proof. Let H be the family of h’ that agree with h on B;.,. Then 


Xiyi(h) = So L(h')ww , 


WeH 


where wz}, is the conditional probability that g = h’ given that g = h on B,41. For 
each h’ € H let H[h’| denote the family of h* that agree with h’ on all points except 
(possibly) B;;, — B;. The H|h’] partition the family of h* agreeing with h on B;. 
Thus we may express 


Xi(h)= D> SO [L(A* anew, 


A'CH h*EH[h’] 


where qp+ is the conditional probability that g agrees with h* on B;+, given that it 
agrees with h on B,. (This is because for h* € H[h’], wn, is also the conditional 
probability that g = h* given that g = h* on B;,,.) Thus 


So wn [L(h')— So L(A") an 


h/CH h* €H[h’] 


So war So lane LLC") — LC") 


h'CH A* EH {h’] 


|Xigi(h) — Xi(A)| 


lI 


lA 


The Lipschitz condition gives |L(h’) — L(h*)| < 1so 


[Xi41(h) — Xi(h)| < » Wh! oS dhe = wy = 1. 


hGH h* €H[h’] h'CH 


Now we can express Azuma’s Inequality in a general form. 
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Theorem 7.4.2 Let L satisfy the Lipschitz condition relative to a gradation of length 
m and let 4 = E[L(g)|. Then for all X > 0, 


Pr[L(g)>ut+AVm] < e/?, 
Pr [L(g) <p-AVvm) < gr. 


The second general setting is taken from Alon, Kim and Spencer (1997). We 
assume our underlying probability space is generated by a finite set of mutually 
independent Yes/No choices, indexed by 7 € J. We are given a random variable Y 
on this space. Let p; denote the probability that choice 2 is Yes. Let c; be such that 
changing choice 7 (keeping all else the same) can change Y by at most c;. We call c; 
the effect of i. Let C' be an upper bound on all c;. We call p;(1 — pi)? the variance 
of choice 7. 

Now consider a solitaire game in which Paul finds the value of Y by making 
queries of an always truthful oracle Carole. The queries are always of a choicei € I. 
Paul’s choice of query can depend on Carole’s previous responses. A strategy for 
Paul can then naturally be represented in a decision tree form. A “line of questioning” 
is a path from the root to a leaf of this tree, a sequence of questions and responses that 
determine Y. The total variance of a line of questioning is the sum of the variances 
of the queries in it. 


Theorem 7.4.3 For all € > 0 there exists 6 > 0. so that the following holds. Suppose 
Paul has a strategy for finding Y such that every line of questioning has total variance 
at most 07. Then 


Pr[|Y —E[Y]| > ao] < 2e°8 208) (7.1) 
for all positive a with aC <a(1+.€)6. 


Applications. For a specific suboptimal bound we may take e = 6 = 1. If C = O(1), 
a@ — oo and a = o(c) the upper bound of (7.1) is exp[—Q(a?)}. In many cases 
Paul queries all i € I. Then we may take o with 0? = 30;-,pi(1 — pi)c?. For 
example, consider an edge Lipschitz Y on G(n, p) with p = p(n) — 0. J is the set 
of m = (5) potential edges, all p; = p, C = 1 so thato = O(./n2p). If a > 00 
with a = 0(4/n?p) the upper bound of (7.1) is again exp[—Q(a?)]. 


Proof. For simplicity we replace Y by Y — E[Y] so that we shall henceforth assume 
E[Y] = 0. By symmetry we shall bound only the upper tail of Y. We set, with 
foresight, A = a/[o(1 + €)]. Our side assumption gives that CA < 6. We will show 

E [e*¥] < ellte)r7o?/2 (7.2) 
The Martingale Inequality then follows by the Markov bound 


Pr [Y > ac] < e Ook [oY] a eo /2(1 +e) . 
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We first claim that for all « > 0 there exists 6 > 0 so that for 0 < p < 1 and |a| < 6 
pel-P)2 4 (1 —p)eP4 < e(ite)p(—p)a? /2 (7.3) 


Take the Taylor series in a of the left-hand side. The constant term is 1, the linear 
term 0, the coefficient of a? is $p(1 -~p) and for j > 3 the coefficient of a? is at most 


rae — pp! + Ap") < =H —p). 


Pick 6 so that |a| < 6 implies 
ae 
J 
> = < €a?/2. 
we 
j=3 


(in particular, this holds for « = 6 = 1.) Then 


2 
a 
pel) + (1—p)e"P* <1+p(1—p) (+e) 


and (7.3) follows from the inequality 1+ 2 < e”. 

Using this 6 we show (7.2) by induction on the depth M of the decision tree. For 
M = 0,Y is constant and (7.2) is immediate. Otherwise, let p,c,v = p(1 — p)c? 
denote the probability, effect and variance, respectively, of Paul’s first query. Let 
}ty, fn denote the conditional expectations of Y if Carole’s response is Yes or No, 
respectively. Then 0 = E[Y] can be split into 


0 = ppty + (1 —p)in- 


The difference 1, — j1,, is the expected change in Y when all other choices are made 
independent with their respective probabilities and the root choice is changed from 
Yes to No. As this always changes Y by at most c, 


[Hy ag Ln| <c. 
Thus we may parametrize 
Hy =(1L—p)b and pp = —pb 
with |b] < c. From (7.3) 
pe» + (1 —p)er#n < elite —p)b?7/2 < e(te)vr?/2 

Let A, denote the expectation of e*”—#») conditional on Carole’s first response 
being Yes and let A,, denote the analogous quantity for No. Given Carole’s first 
response Paul has a decision tree (one of the two main subtrees) that determines Y 


with total variation at most o? — v and the tree has depth at most M — 1. So by 
induction A,, A, < A~, where we set 


Am = ellt9)\?(0?=v)/2 | 
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Now we split 


E[e**] = pe A, +(1—p)e"A, 
< [pe + (1 ~p)e***]A- 
< elite) (vt(o?—v))/2 
completing the proof of (7.2) and hence of Theorem 7.4.3. a 


We remark that this formal inductive proof somewhat masks the martingale. A 
martingale E [Y] = Yo,..., Ya = Y can be defined with Y; the conditional expec- 
tation of Y after the first £ queries and responses. Theorem 7.4.3 can be thought of as 
bounding the tail of Y by that of a norma! distribution of greater or equal variance. 
For very large distances from the mean, large a, this bound fails. 


7.5 FOUR ILLUSTRATIONS 


Let g be the random function from {1,...,n} to itself, all n” possible functions 
equally likely. Let L(g) be the number of values not hit, that is, the number of y for 
which g(x) = y has no solution. By linearity of expectation, 


BIL@)i=n(1-2) , 


and this quantity is at most n/e and at least n(1 —1/n)"—1-(1—1/n) > (n—-1)/e. 

Set B; = {1,...,i}. L satisfies the Lipschitz condition relative to this gradation 
since changing the value of g(i) can change L(g) by at most 1. Thus we have the 
following. 


Theorem 7.5.1 Pr {|Z(g) — =| > Avn+1] <2”. 


Deriving these asymptotic bounds from first principles is quite cumbersome. 

As a second illustration let B be any normed space and let v,,...,U, € B with 
all |v;| < 1. Lete,,...,€, be independent with Pr [e; = +1] = Pr[e; = —1] = 1/2 
and set 


X = leyuy t+++ + envy]. 

Theorem 7.5.2 
Pr [X—E[X]>Avn] < e/?, 
Pr[X —E[X]<-Avn] < @%/2. 


e€ 


Proof. Consider {—1, +1}” as the underlying probability space with all (€;,...,€n) 
equally likely. Then X is arandom variable and we define a martingale Xo,...,X, = 
X by exposing one €; at a time. The value of e; can only change X by 2, so direct 
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application of Theorem 7.4.1 gives |X;,, — X;| < 2. But let €,¢’ be two n-tuples 
differing only in the zth coordinate: 


Xie) = (Xisi(e) + Xisi(e'))/2 
so that 
|Xi(e) — Xiai(6)| = [Aisi le’) — Xi41(Q|/2 <1. 
Now apply Azuma’s Inequality. a 
For a third illustration let p be the Hamming metric on {0,1}". For A € {0,1}” 


let B(A,s) denote the set of y € {0,1}” so that p(x,y) < s for some x € A. 
[A C B(A, s) as we may take x = y.] 


Theorem 7.5.3 Let €, > 0 satisfy e~’/2 = ©. Then 


|Aj > €2” = |B(A, 2AV/n)| > (1 — €)2”. 


Proof. Consider {0, 1}” as the underlying probability space, all points equally likely. 
For y € {0,1}” set 
X(y) = min p(x, ¥). 

Let Xo, X1,..., Xp, = X be the martingale given by exposing one coordinate of 
{0,1}" at a time. The Lipschitz condition holds for X: If y,y’ differ in just one 
coordinate then |X (y) — X(y’)| < 1. Thus, with p = E[X}, 

Pr[X <pp—Avn| < eo 12 =e, 

Pr[X >pt+Avn] < el? =e, 
But 

Pr[X = 0] =|Al2™" > e, 


so pt < A\/n. Thus 
Pr [X > 2AV/n] <e 


and 
|B(A, 2AV/n)| = 2”Pr [X < 2AVn] > 2"(1—e). 


Actually, a much stronger result is known. Let B(s) denote the ball of radius s 
about (0,...,0). The Isoperimetric Inequality proved by Harper (1966) states that 


|A|21Bir)| > IBA, s)| > |B(r + 8). 


One may actually use this inequality as a beginning to give an alternate proof that 
x(G) ~ n/2 log, n and to prove a number of the other results we have shown using 
martingales. 
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We illustrate Theorem 7.4.3 with a key technical lemma (in simplified form) from 
Alon, Kim and Spencer (1997). Let G = (V, F) be a graph on N vertices, each 
vertex having degree D. Asymptotics will be for N, D —> oo. Set p = 1/D. Define 
a random subgraph H C G by placing each edge e € E in H with independent 
probability p. Let M (for matching) be the set of isolated edges of H. Let V* be 
those v € V not in any {v,w} € M. For v € V set deg*(v) equal to the number of 
w € V* with {v,w} € E. As 


PrlwZV*])= YS) p(l—p)??-1 =e? + 0(D"}), 
{u,w}EE 


linearity of expectation gives 
E [deg*(v)| = D(1 —e7*) + O(1). 


We want deg” (v) tightly concentrated about its mean. 

In the notation of Theorem 7.4.3 the probability space is determined by the choices 
e € H foralle € E. All p; = p. Changing e € H toe ¢ H can change deg*(v) by 
at most C' = 4. 

Paul needs to find deg” (v) by queries of the form “Is e € H?’ For each w with 
{v,w} © E he determines if w € V* by the following line of inquiry. First, for all 
u with {w,u} € E he queries if {w,u} € H. If no {w,u} € H thenw € V*. If 
two (or more) {w, ui}, {w, u2} € H then w cannot be in an isolated edge of H so 
w € V*. Now suppose {w,u} € H for precisely one u. Paul then asks (using his 
acquired knowledge!) for each z # w with {u, z} € E if {u,z} © H. The replies 
determine if {w, u} is an isolated edge of H and hence if w € V*. Paul has made at 
most D + (D — 1) queries for each w for a total of at most D(2D — 1) = O(D?) 
queries. We deduce 


Pr [ deg*(v) — D(1 — e-?)| > ADY?] = exp[—(2)] 


when \ — 00 and \ = o( D?/?), 

In application one wishes to iterate this procedure (now applying it to the restriction 
of G to V*) in order to find a large matching. This is somewhat akin to the Rédl nibble 
of Section 4.7. There are numerous further complications but the tight concentration 
of deg* (v) about its mean plays an indispensable role. 


7.6 TALAGRAND’S INEQUALITY 


Let 2 = []j_, 23, where each 2; is a probability space and ( has the product 
measure. Let A C Q and let % = (aj,...,2n) € Q. Talagrand (1996) gives an 
unusual, subtle and ultimately powerful notion of the distance — denoted p(A, #) — 
from Z to A. We imagine moving from Z to some 7 = (y1,..., Yn) € A by changing 
coordinates. p(A, z) will measure the minimal cost of such a move when a suitably 
restricted adversary sets the cost of each change. 
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Definition 2 p(A,Z) is the least value such that for any & = (a1,...,Qn) € R” 
with |@| = 1 there exists ¥ = (yi,..-, Yn) € A with 

pa a,s P(A, £) : 

LiFYi 


Note that ¥ can, and generally will, depend on a. 
We define for any real t > 0, 


Theorem 7.6.1 [Talagrand’s Inequality] 
Pr [A] (1 — Pr[At]) <e7? 4. 


In particular, if Pr [A] > 4 (or any fixed constant) and ¢ is “very large” then all 
but a very small proportion of 2 is within “distance” ¢ of A. 


Example. Take 2 = {0, 1}” with the uniform distribution and let 7 be the Hamming 
(L!) metric. Then p(A, #) > minge 4 7(Z, 7)n~1/? as the adversary can choose all 
a; = n—\/?, Suppose to move from # to A the values 21,..., 2, (or any particular 
| coordinates) must be changed. Then p(A, Z) > 1/2 as the adversary could choose 
a, = 1~1/? for 1 < i < Land zero elsewhere. 

Define U(A, Z) to be the set of # = (s1,..., Sn) € {0,1}” with the property that 
there exists 7 € A such that 


LF Y > & =1. 


We may think of U(A, x) as representing the possible paths from ¢ to A. Note that 
when s; = 1 we, for somewhat technical reasons, do not require x; # y;. With this 
notation p(A, 2) is the least real so that for all @ with |@| = 1 there exists s € U(A, Z) 
with @- 3 < p(A,Z). 

Now define V (A, Z) to be the convex hull of U(A, Z). The following result gives 
an alternate characterization of p that supplies the concept with much of its richness. 


Theorem 7.6.2 


A,£)= i 
OP ns 


Proof. Let ¢ € V(A, Z) achieve this minimum. The hyperplane through @ perpen- 
dicular to the line from the origin to @ then separates V (A, x) from the origin so that 
all s € V(A, 2) have §-0 >0-%. Set & = G/|v|. Then all # € U(A,Z) C V(A, #) 
have §-@ > v-0/|0| = |d|. Conversely, take any & with |@| = 1. Then @-@ < |]. 
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As U € V(A,£) we may write v = 5> A,5; for some 5; € U(A, Z), with all \; > 0 
and )> A; = 1. Then 


|u| > se Ai (a - 3%) 


and hence some @ - 5; < |). a 


The case 2 = {0,1}" is particularly important and instructive. There p(A, Z) is 
simply the Euclidean distance from Z to the convex hull of A. 


Theorem 7.6.3 


[ox Face) d# < marie 


Theorem 7.6.1 is an immediate corollary of the above result. Indeed, fix A and 
consider the random variable X = p(A,£). Then 


Pr [At | =Prixe t= Pr jex*/4 > es <E [ex*/4] et /4 ‘ 


and the theorem states E lex “4 4) < 1/Pr [A]. 


Proof [Theorem 7.6.3]. We use induction on the dimension n. Forn = 1, p(A,Z) = 
1 if £ ¢ A, zero otherwise so that 


[ox Fra] = Pr[A] + (1 — Pr [A])et/4 < ia 


as the inequality u+(1—u)e!/4 < u7! for0 < u < 1isa simple calculus exercise. 
Assume the result for n. Write OLD = []?_,0;, NEW = Q,41 so that 


Q = OLD x NEW and any z € 2 can be uniquely written z = (x,w) with 
xz € OLD, w € NEW. Set 


B= {x € OLD: (z,w) € A for some w € NEW} 
and for any w € NEW set 
Ay = {x € OLD: (2,w) € A}. 
Given z = (x,w) € 2 we can move to A in two basic ways — either by changing 
w, which reduces the problem to moving from z to B, or by not changing w, which 
reduces the problem to moving from z to A,,. Thus 
SEU(B,z) a (s,1) € U(A, (a, w)) 


and 
t€ U(A,, 2) > (t,0) € U(A, (a,w)). 
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Taking the convex hulls, if 3 € V(B,x) and f € V(Aj, x) then (5,1) and (£, 0) are 
in V(A, (x,w)) and hence for any  € [0, 1], 
((1 — A)F4+ At, 1 — A) € V(A, (2,w)). 
Then, by convexity, 
p?(A, (a,w)) < (1— )? + [1 — d+ AE? < (1 — A)? + (1 — A/S? + ANE?. 
Selecting 5, ¢ with minimal norms yields the critical inequality 
p°(A, (2,w)) < (1d)? +.p?(Au, x) + (1 — A)p*(B, 2). 


Quoting from Talagrand, “The main trick of the proof is to resist the temptation 
to optimize now over 4.” Rather, we first fix w and bound 


[ew Fac (2)| 


< rie f Cexp ae]) (exp ewe) 


By H6lder’s Inequality this is at most 


e(l-r)?/4 ff exp fea.e)]| | exp ews] ; 


which by induction is at most 


d 1-d 
e(l-A)?/4 1 1 = ot a4 
Pr [Au] Pr[B] Pr [B] , 


where r = Pr[A,] /Pr[B] < 1. Now we use calculus and minimize e—»)”/47—A 
by choosing A = 1+ 2Inr for e-!/* < r < 1 and \ = 0 otherwise. Further 


(somewhat tedious but simple) calculation shows e@—>)"/4r7—>_ < 2 —r for this 
A = X(r). Thus 


t 1 Pr [A,.] 
= < 2-—-——- }. 
[2 [jrcw] < aq (2- Fer) 
We integrate over w giving 


[ [exw Fa (2,4))| < PB (2 — s st) = Beate x), 


where x = Pr[A] /Pr[B] € [0,1]. But «(2 — x) < 1, completing the induction and 
hence the theorem. a 
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7.7 APPLICATIONS OF TALAGRAND’S INEQUALITY 


Let Q = []}_, 9i, where each 2; is a probability space and 2 has the product 
measure. Leth : Q — R. Talagrand’s Inequality enables us, under certain conditions, 
to show that the random variable X = h(-) is tightly concentrated. In this sense it 
can serve the same function Azuma’s Inequality does for martingales and there are 
many cases in which it gives far stronger results. 

We call h : Q — R Lipschitz if |h(x) — h(y)| < 1 whenever x, y differ in at 
most one coordinate. Talagrand’s Inequality is most effective on those Lipschitz 
functions with the property that when h(x) > s there are a relatively small number 
of coordinates that will certify that h(x) > s. We formalize this notion as follows. 


Definition 3 Let f : N — N. his f-certifiable if whenever h(x) > s there exists 
IC {1,...,n} with |I| < f(s) so that all y € Q that agree with x on the coordinates 
T have h({y) > s. 


Example. Consider G(n,p) as the product of (5) coin flips and let h(G) be the 
number of triangles in G. Then h is f-certifiable with f(s) = 3s. For if h(G) > s 
there exist s triangles that together have at most 3s edges and any other G’ with those 
3s edges has h(G’) > s. Note that J, here the indices for those 3s edges, very much 
depends on G. Also note that we need certify only lower bounds for h. 


Theorem 7.7.1 Under the above assumptions and for all b, t, 


Pr [x <b- iV FO)| Pr[X > 6] <e-?/4. 


Proof. Set A = {x : h(x) < b—t,/f(b)}. Now suppose h(y) > b. We claim 
y & A;. Let I be a set of indices of size at most f(b) that certifies h(y) > bas given 
above. Define a; = 0 wheni ¢ I, a; = |I|~'/? wheni € I. If y € A; there exists 
az € A that differs from y in at most ¢|/|!/? < t,/f(b) coordinates of J though at 
arbitrary coordinates outside I. Let y’ agree with y on J and agree with z outside J. 
By the certification h({y’) > b. Now y’, z differ in at most t,/ f(b) coordinates and 
so, by Lipschitz, 
h(z) = h(y') —tV f(b) = b- tv f(0), 


but then z ¢ A, a contradiction. So Pr[X > b] < Pr [Az] and from Talagrand’s 
Theorem, 


Pr [x <b-t F0)| Pr[X >b] <e-?/4, 


As the right-hand side is continuous in tf we may replace ‘<’ by ‘<’ giving the 
theorem. 


A small generalization is sometimes useful. Call h : Q — R K-Lipschitz if 
|h(x) — h(y)| < K whenever z, y differ in only one coordinate. Applying the above 
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theorem to h/K, which is Lipschitz, we find 
Pr [x rai tK /F(6)| Pr{X >b) <e7?/4, 


In applications one often takes 6 to be the median so that for t large the probability 
of being t,/ f(b) under the median goes sharply to zero. But it works both ways, 
by parametrizing so that m = b — t,/f(b) is the median one usually gets b ~ 
m-+ty/ f(m) and that the probability of being t,/ f(b) above the median goes sharply 
to zero. Martingales, via Azuma’s Inequality, generally produce a concentration 
result around the mean yz of X while Talagrand’s Inequality yields a concentration 
result about the median m. Means tend to be easy to compute, medians notoriously 
difficult, but a tight concentration result will generally allow us to show that the mean 
and median are not far away. 

Let x = (21,...,2n), where the x; are independently and uniformly chosen from 
[0,1]. Set X = h(x) to be the length of the longest increasing subsequence of x. 
Elementary methods give that ¢n!/2 < X < cgn!/? almost surely for some positive 
constants ¢),C2 and that the mean yz and median m of X are both in that range. 
Also X is Lipschitz, as changing one x; can only change X by at most one. How 
concentrated is X? We can apply Azuma’s Inequality to deduce that if s >> n!/? 
then |X -- | < s almost surely. This is not particularly good since X itself is only of 
order n'!/?, Now consider Talagrand’s Inequality. X is f-certifiable with f(s) = s 
since if x has an increasing subsequence of length s then those s coordinates certify 
that X > s. Then Pr |X <m-— tmi/?] < eo? /4/Pr [X > m] < 2e-*?/4 as mis 
the median value. But m = ©@(n!/?), Thus when s >> n!/4 we have X > m—s 
almost surely. For the other side suppose t — oo slowly and let b be such that 
b — th/2 = m. Then Pr[X > b] < e~”/4/Pr[X < m] < 2e-"/4. Then X <b 
almost surely. But b = m+ (1+ 0(1))tm!/? so that X < m-+tm//? almost surely. 
Combining, if s >> n!/4 then |X — m| < s almost surely. A much stronger result, 
determining the precise asymptotic distribution of X, has been obtained by Baik, 
Deift and Johansson (1999), using deep analytic tools. 

Let’s reexamine the bound (Theorem 7.3.2) that G(n, $) has no clique of size k 
with & as defined there. We let, as there, Y be the maximal number of edge disjoint 
k-cliques. From the work there E[Y] = Q(n?k~4) and Y is tightly concentrated 
about E [Y] so that the median m of Y must also have m = Q(n7k~“). As before Y 
is Lipschitz. Further Y is f-certifiable with f(s) = (5)s as the edges of the s-cliques 
certify that Y > s. Hence 


Pr 


1/2 
¥ <m—tm'l?(*) [Pet =m <er We 


/2 


Set t = O(m!/?/k) so that nm = tm!/? Gy: . Then 


Pr [w(G) < k] = Pr{[¥ <0] < 2e7* 4 < exp |-2 ( us )] 


In’ n 
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which improves the bound of Theorem 7.3.2. Still, we should note that application 
of the Extended Janson Inequality in Section 10.3 does even better. 


7.8 KIM-VU POLYNOMIAL CONCENTRATION 


The approach of Kim and Vu (2000) is often useful. Let H = (V(H), E(H)) be 
a hypergraph and let each edge e € EH) have a nonnegative weight w(e). Let 
t;, i € V(H) be mutually independent indicator random variables with E [t;] = p;. 
Consider the random variable polynomial 


Y= S> w]e. 


e€E(H) ie 


We allow e = @ in which case [|] 
is concentrated about its mean. 

Let S C V(#) be a random set given by Pr [i € S| = p;, these events mutually 
independent over 2 € V(H). Then Y is the weighted number of hyperedges e in the 
restriction of H to S. In applications we generally have all weights equal one so that 
Y simply counts the hyperedges in the random S. But we may also think abstractly of 
Y as simply any polynomial over the indicators t; having all nonnegative coefficients. 

We set n = |V(H)|, the number of vertices of H (number of variables t;). Let & 
be an upper bound on the size of all hyperedges (upper bound on the degree of the 
polynomial Y). 

Let A C V{#) with |A] < k. We truncate Y to Y, as follows: For those terms 
[ice ti with A C e we set t; = 1 for alli € A, replacing the term by [[,-,_ 4 t. All 
other terms (where e does not contain A) are deleted. For example, with A = {1}, 
Qtyte + 5tit3t4 + Ttot, becomes 2t. + 5t3t4. Intriguingly, as polynomials in the t,, 
Y4 is the partial derivative of Y with respect to the t;,i € A. Set E4 = E[Ya]. That 
is, E'4 is the expected number of hyperedges in S that contain A, conditional on all 
vertices of A being in S. Set FE; equal to the maximal F’, over all A C V(H) of size 
i. Set up = E[Y] for convenience and set 


ice ti 18 by convention 1. We want to show that Y 


r_ ; = 7 
E = inex and E£=max{p, E"}. 


Theorem 7.8.1 [Kim-Vu Polynomial Concentration] With the above hypotheses 
Pr [iv — p| > ax(EE)"/?2*] < dpe~*n*-} 
forany > 1. 


Here, for definiteness, we may take a, = 8*k!!/? and dy = 2e?. 

We omit the proof, which combines martingale inequalities similar to those of 
Theorem 7.4.3 with a subtle induction on the degree k. There may well be room for 
improvement in the az, d, and n*~! terms. In applications one generally has k fixed 
and > Inn so that the e~* term dominates the probability bound. 
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Applications of Kim—Vu polynomial concentration tend to be straightforward. Let 
G ~ G(n,p) with p = n~% and assume 0 < a < 2/3. Fix a vertex x of G and let 
Y = Y(z) be the number of triangles containing x. Set p = E[Y] = Con ~ 
4n?—8o_ Let 6 > 0 be fixed. We want to bound Pr [|Y — | > dy). 

The random graph G is defined by the random variables ¢;;, one for each unordered 
pair of vertices, which are indicators of the adjacency of the two vertices. In that 


context 
VS Stee 
ij#ax 

This is a polynomial of degree k = 3. When A consists of a single edge {x,i} we 
find E4 = (n — 2)p?; when it consists of three edges forming a triangle containing 
x we find E4 = 1. When A = 9, E, = ps. Other cases give smaller E.4. Basically 
E’ ~ max{np*, 1}. Calculation gives E’ ~ cuun~< for some positive « (dependent 
on @) throughout our range. We apply Kim—Vu polynomial concentration with \ = 
c'n‘/®, c! a small positive constant, to bound Pr [|Y — ps| > du] by exp[—Q(n*/®)}. 
Note that the n*—! factor is absorbed by the exponential. 

In particular, as this probability is o(n~ +), we have that almost surely every vertex 
x isin ~ y triangles. This result generalizes. Fix a € (0, 1) and suppose (Rf, H) isa 
rooted graph, safe, in the sense of Section 10.4, with respect to a. Let G ~ G(n, p) 
with p = n~°. For distinct vertices 71,...,2, let Y = Y(x1,...,2,) denote the 
number of extensions in G to H. Set ys = E[Y]. Kim—Vu polynomial concentration 
gives an exponentially small upper bound on the probability that Y is not near yz. In 
particular, this probability is o(n~"). Hence almost surely every r vertices have ~ 
extensions to H. 


7.9 EXERCISES 


1. Let G = (V, E) be the graph whose vertices are all 7” vectors of length n 
over Z7, in which two vertices are adjacent iff they differ in precisely one 
coordinate. Let U c V beaset of 7”~! vertices of G, and let W be the set of 
all vertices of G whose distance from U exceeds (c + 2)./n, where c > Oisa 
constant. Prove that |W| < 7” - ee 12, 


2. (*) Let G = (V,E) be a graph with chromatic number x(G) = 1000. Let 
U c V be arandom subset of V chosen uniformly among all 2!”! subsets of 
V. Let H = G[U] be the induced subgraph of G on U. Prove that 


Pr [x(H) < 400! < 1/100. 


3. Prove that there is an absolute constant c such that for every n > 1 there is an 
interval J, of at most c\/n/ log n consecutive integers such that the probability 
that the chromatic number of G(n, 0.5) lies in J, is at least 0.99. 


THE PROBABILISTIC LENS: 


Weierstrass Approximation 
Theorem 


The well-known Weierstrass Approximation Theorem asserts that the set of real 
polynomials over [0,1] is dense in the space of all continuous real functions over 
(0, 1]. This is stated in the following theorem. 


Theorem 1 [Weierstrass Approximation Theorem] For every continuous real 
function f : [0,1] — R and every € > 0, there is a polynomial p(x) such that 
|p(x) — f(x)| < efor all x € [0,1]. 


Bernstein (1912) gave a charming probabilistic proof of this theorem, based on 
the properties of the binomial distribution. His proof is the following. 


Proof. Since a continuous f : [0,1] — R is uniformly continuous there is a 6 > 0 
such that if 2,2’ € [0,1] and |2 — 2’| < 6 then | f(x) — f(x’)| < €/2. In addition, 
since f must be bounded there is an M > 0 such that |f(x)| < © in [0, 1]. 

Let B(n, x) denote the binomial random variable with n independent trials and 
probability of success x for each of them. Thus the probability that B(n,2) = 7 
is precisely (7)a7(1—a)"/. The expectation of B(n, x) is nz and its standard 
deviation is ,/nz(1— 2) < Wn. Therefore, by Chebyshev’s Inequality discussed 
in Chapter 4, for every integer n, Pr [|B(n, x) ~ nx| > n?/3] < 1/n'/°. It follows 
that there is an integer n such that 


P [IB ,x)— > */3| aoe 
r||B(n, 2) —nz|>n <i 


and 
1 


nia <9: 
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Define z 
3 a akss 
Pale) = S(T) arse, 
We claim that for every x € [0,1], |P,(x) — f(x)| < e. Indeed, since 


‘ a(1—2)""*=1, 


7=0 
we have 
Pale) - fle)| < (")ata —ayUpti/n) ~ #02) 
i:[i—n2|<n?2/3 
+ Geta =a stray + Fe 
iji-na|>n?2/3 
< (7) 24 —2)"‘isti/n) - Fe) 


t:li/n—2l\<n-l/3<6 
+2M Pr [|B(n, 2) ne) = 2/3) 


€ € 
eo ELOY =e, 
Sp aM S 


This completes the proof. 


The Poisson Paradigm 


One of the things that attracts us most when we apply ourselves to a mathematical 
problem is precisely that within us we always hear the call: here is the problem, 
search for the solution, you can find it by pure thought, for in mathematics there 
is no ignorabimus. 


— David Hilbert 


When X is the sum of many rare indicator “mostly independent” random variables 
and jt = E[X], we would like to say that X is close to a Poisson distribution with 
mean jz and, in particular, that Pr [X = 0] is nearly e~“. We call this rough statement 
the Poisson Paradigm. In this chapter we give a number of situations in which this 
paradigm may be rigorously proved. 


8.1 THE JANSON INEQUALITIES 


In many instances we would like to bound the probability that none of a set of bad 
events {B;}ie, occur. If the events are mutually independent then 


AB 


ie] 


Pr 


=|] Pr [Bi] ‘ 


tel 
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When the B; are “mostly” independent the Janson Inequalities allow us, sometimes, 
to say that these two quantities are “nearly” equal. 
Let © be a finite universal set and let R be a random subset of 2 given by 


Pr [re] = px, 


these events mutually independent over r € 2. Let {A;}icr be subsets of 2, J a finite 
index set. Let B; be the event A; C R. (That is, each point r € 2 “flips a coin” to 
determine if itis in R. B; is the event that the coins for all r € A; came up “heads.”) 
Let X; be the indicator random variable for B; and X = ee , 4; the number of 
A; C R. The event Keep Be and X = 0 are then identical. For 7,7 € J we write 
i~jifi#j and A; A; # O. Note that when i # j and not i ~ j then B;, B; 
are independent events since they involve separate coin flips. Furthermore, and this 
plays a crucial role in the proofs, if7 ¢ J C I and noti ~ 7 for all j € J then B; is 
mutually independent of {B;}j<,, that is, independent of any Boolean function of 
those B;. This is because the coin flips on A; and on Uje 7A; are independent. We 
define 
A=) > Pr(B; AB;). 
inj 


Here the sum is over ordered pairs so that A/2 gives the same sum over unordered 


pairs. We set 
M =|[Pr [Bi] . 
ie€l 


the value of Pr [A\,.; Bi] if the B; were independent. Finally, we set 


= E[X] = >> Pr (Bi) . 
ie! 
The following results were given in Janson, Luczak and Rucitiski (1990). 


Theorem 8.1.1 [The Janson Inequality] Let {B;}ic;, A, M, ps be as above and 
assume all Pr[B;] < «. Then 


M <Pr| A Bi] < Mel/0-91472 


tel 


and, further, 
Pr A | aes eas 
ie] 
For each i € I = 
Pr [Bi] =1-Pr([Bi] < e Pri Bi) 


so, multiplying over z € J, 
M<e™"." 
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The two upper bounds for Theorem 8.1.1 are generally quite similar; we tend to use 
the second for convenience. In many asymptotic instances a simple calculation gives 
M ~e*. In particular, this is always the case when € = o(1) and eu = o(1). 

Perhaps the simplest example of Theorem 8.1.1 is the asymptotic probability that 
G(n, c/n) is triangle-free, given in Section 10.1. There, as is often the case, € = o(1), 
A = 0(1) and yz approaches a constant k. In those instances Pr [A,-; Bi] > e~*. 
This is no longer the case when A becomes large. Indeed, when A > 2y the upper 
bound of Theorem 8.1.1 becomes useless. Even for A slightly less it is improved by 
the following result. 


Theorem 8.1.2 [The Extended Janson Inequality] Under the assumptions of The- 
orem 8.1.1 and the further assumption that A > pL, 


Pr AB < en /28 | 


ier 


Theorem 8.1.2 (when it applies) often gives a much stronger result than Cheby- 
shev’s Inequality as used in Chapter 4. In Section 4.3 we saw Var [X] < 4+ A so 
that 


B| =prix = Var[X} — w+tA 
Pr AB = Pr[X =0] < Exp ar 


Suppose pp —> oo, p < A and y = p?/A — oo. Chebyshev’s upper bound on 
Pr [X = 0] is then roughly ~~! while Janson’s upper bound is roughly e~7. 


8.2 THE PROOFS 


The original proofs of Janson are based on estimates of the Laplace transform of an 
appropriate random variable. The proof we present here follows that of Boppana and 
Spencer (1989). We shall use the inequalities 


Pr |B; | A Bj| <PriBil, 
ged 


valid for all index sets J C J,i ¢ J and 


Pr |B; | Be A \ B| < Pr [Bi | Be] , 
jet 
valid for all index sets J C I,i,k ¢ J. The first follows from Theorem 6.3.2. The 


second is equivalent to the first since conditioning on B, is the same as assuming 
pr = Pr[r € R] = 1 forall r € Ax. 
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Proof [Theorem 8.1.1]. The lower bound follows immediately. Order the index set 
I = {1,...,m} for convenience. For 1 <i < m, 


1<j<i 
So 
Pr | B; | B;| > Pr [Bi] 
1<j<i 
and 
lB -T]P B: | f\ -Bi| = [Pe Bi. 
zel 1<j<i t=1 


Now the first upper bound. For a given 7 renumber, for convenience, so thatz ~ j 
for 1 < 7 < dand not ford+1 <j <2. We use the inequality Pr i | BA AC] 2 
Pr [AA B AB | a1, valid for any A,B,C. With A = B,, B = By A--+ A By and 
C= Bayi: A Bin, 


Pr{B;| /\ Bj| = Pr[A| BAC] >Pr{AAB|C] 
1<j<i 


= Pr{A|C]Pr[B| AAC]. 
From the mutual independence Pr [A | C’] = Pr [A]. We bound 


d 
Pr{B| AAC] >1- gat [B; | Bi AC] 5 B; | Bi 


j=l j=l 
from the Correlation Inequality. Thus 


d 
Pr|/Bi| A Bj| >Pr[Bil— > Pr[By A Bi). 


1<j<i j=l 
Reversing 
d 


Pr|Bi| A, By] < Pr [Bi] + >> Pr{B; AB] 


1<j<i g=1 


& 
A 


d 
— 1 
Pr [Bi] |1+>— Y= Pr [By A Bil 


j=l 


IA 
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since Pr [Bi] > 1—. Employing the inequality 1 + x < e”, 


d 
— _ 1 
Pr |B;| /\ Bj| < Pr [Bi] exp To DL PrlBi A Bil 
1<j<i j=l 


For each 1 < i < m we plug this inequality into 


IN 


iel 


me 


=TlP |B AF 
t=1 


1<j<i 


Pr 


The terms Pr [Bi] multiply to M4. The exponents add: for each 2,7 € J with 7 <2 
and j ~ 7 the term Pr [B, A B,] appears once so they add to A/2. 
For the second upper bound we instead bound 


d 
Pr{Bi| A By] < 1-Pr{Bil+ > Pr[BjA Bi] 
1<j<t j=l 
d 
< exp { —Pr [Bi] + 5° Pr[B; A Bil 
j=l 


Again, the Pr [B; A B,] terms add to A/2 while the —Pr|B,] terms add to —p. 
Proof [Theorem 8.1.2]. The second upper bound of Theorem 8.1.1 may be rewritten 
oe 1 
—In (» \Bi > 5° Pr [Bi] - 5 Pr lBi A B;] - 
iel i€l inj 
For any set of indices S C J the same inequality applied only to { B;}ies gives 
ea 1 
—In ([» \ B) 2 Pri B, |e : S> Pr[Bi A Bj] . 


i€S ieS tL JES Nj 
Let now S be arandom subset of J given by 


Pr{i€ S] =p, 


with p a constant to be determined, the events mutually independent. (Here we are 
using probabilistic methods to prove a probability theorem!) Each term Pr [B;] then 
appears with probability p and each term Pr [B; A Bj] with probability p? so that 


p|-n(ms Al) 


ieS 


IV 


E pa Pr a _ se a Pr [Bi A Bj] 


ies iG€S,inj 


= ge 
Pit ea 
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We set 


ei 
ome 
so as to maximize this quantity. The added assumption of Theorem 8.1.2 assures us 
that the probability p is at most 1. Then 


Lololam)os 


Thus there is a specific S C J for which 


—In|P B,| |) > 
oe \ a= Oe 
iES 
That is, 
Pr NA Bi < eo /2h 
iES 

But 

Pr \Bi < Pr AB ; 

ie] iES 

completing the proof. a 


8.3 BRUN’S SIEVE 


The more traditional approach to the Poisson Paradigm is called Brun’s sieve, for 
its use by the number theorist T. Brun. Let By,..., By, be events, X; the indicator 
random variable for B; and X = X, + +--+ X,, the number of B; that hold. Let 
there be a hidden parameter n (so that actually m = m(n), Bj = Bi(n), X = X(n)), 
which will define our 0, O notation. Define 


SO) = °Pr[Bi, A--- A Bi], 
the sum over all sets {71,...,i-} C {1,...,m}, and put 
X) = X(X -1)---(X—r+]). 
The inclusion—exclusion principle gives that 
Pr[X = 0] =Pr [Bi A-:-A Bn] =1- 8% 452 —...4(-1)75™.... 
Theorem 8.3.1 Suppose there is a constant 1 so that 


E[X]=S® sp 


BRUN’S SIEVE 


and such that for every fixed r, 


Then 
Pr[X =0] -e 4 
and indeed for every t 


yt 
Pr [X =t)3 Be a 


Proof. We do only the case t = 0. Fix € > 0. Choose s so that 
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The Bonferroni Inequalities state that, in general, the inclusion—exclusion formula 


alternately over- and underestimates Pr [|X = 0]. In particular, 


2s 
=< y(-yrs™ 
r=0 


Select no (the hidden variable) so that for n > no, 


€ 


sm) — a Ca 
~ 2(28 + 1) 


for0 <r < 2s. For suchn 
Pr[X =0)<e 4% +e. 

Similarly, taking the sum to 2s + 1 we find no so that for n > no, 
Pr[X =0] >e 4 -e. 


As € was arbitrary Pr |X = 0] +e". 


The threshold functions for G ~ G(n,p) to contain a copy of a given graph H, 
derived in Section 10.1 via the Janson Inequality, were originally found using Brun’s 
sieve. Here is an example where both methods are used. Let G ~ G(n,p), the 
random graph of Chapter 10. Let # PIT represent the statement that every vertex 


lies in a triangle. 


Theorem 8.3.2 Let c > 0 be fixed and let p = p(n), u = p(n) satisfy 


Ca)? 


ef = 


Slo & 
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Then 
lim Pr [G(n, p) satisfies EPIT| =e°. 
In Spencer (1990b) threshold functions are found for a very wide class of “exten- 
sion statements” that every 7 vertices lie in a copy of some fixed H. 


Proof. First fix « €¢ V(G). For each unordered y,z € V(G) — {x} let Bry, be 
the event that {x,y,z} is a triangle of G. Let C, be the event A, , Bry, and Xz 
the corresponding indicator random variable. We use Janson’s Inequality to bound 

E[X,] = Pr[C,]. Here p = 0(1) soe = o(1). $9 Pr[Bzyz] = pas defined above. 
Dependency xyz ~ xuv occurs if and only if the sets overlap (other than in 2). 
Hence 


A= >) Pri Baye A Beye) = OEP) = 001) 


Ye ,2" 


since p = n~2/3+0(), Thus 


E[X,|~e# == 
Tr 


SO es 


xEV(G) 


Now define 


the number of vertices x not lying in a triangle. Then from linearity of expectation, 


E[X]= > E[X,] 


reEV(G) 


We need to show that the Poisson Paradigm applies to X. Fix r. Then 


e[C)]-89 Berean neat 


the sum over all sets of vertices {2,...,2,}. All r-sets look alike so 
xX : 
E (7 )} = (")Pr ics, Av AC] ~ “Pr [Coy Kee Cz 5 
‘ r Tr: 
where 21,...,Z, are some particular vertices. But 


Cz, A-+-ACy,, = \ Barges 


the conjunction over 1 < zi < r and all y, z. We apply Janson’s Inequality to this 
conjunction. Again € = p? = o(1). The number of {2;,y, z} is r("5') — O(n), the 
overcount coming from those triangles containing two (or three) of the x;. (Here it 


is crucial that r is fixed.) Thus 


TPrBawel = 08 (2("5 1) = O(m)) = ru ron), 
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As before A is p® times the number of pairs z;yz ~ x;y'z'. There are O(rn®) = 
O(n?) terms with i = j and O(r?n?) = O(n?) terms with i 4 j so again A = o(1). 
Therefore 

Pr (Cz, A+++ ACg,| ~ e7™ 


eC] S92-8 


Hence the conditions of Theorem 8.3.1 are met for X. |_| 


and 


8.4 LARGE DEVIATIONS 


We return to the formulation of Section 8.1. Our object is to derive large deviation 
results on X similar to those in Appendix A. Given a point in the probability space 
(i.e., a selection of ) we call an index set J C I a disjoint family (abbreviated 
disfam) if 


e B; forevery j € J. 
e Forno j,j’ € Jisg~ 7’. 
If, in addition, 
e If 7’ ¢ J and By then j ~ j’ for some j € J, 


then we call J a maximal disjoint family (maxdisfam). We give some general results 
on the possible sizes of maxdisfams. The connection to X must then be done on an 
ad hoc basis. 


Lemma 8.4.1 With the above notation and for any integer s, 


s 
Pr [there exists a disfam J, |J| = 8] < a ; 
s! 
Proof. Let 5>* denote the sum over all s-sets J C I with no j ~ 7’. Let 5>° denote 
the sum over ordered s-tuples (j1,..., js) with {j1,..., js} forming such a J. Let 
5“ denote the sum over all ordered s-tuples (j;,..., js). Then 


Pr [there exists a disfam J, |J| = s} < 5°>*Pr \ B; 
jed 
* 1 5 
>" [] Pr iB, = j2. Pr[B;,]-+-Pr[By,] 


z Ss 
jéed 


= ae [Bj,]---Pr[Bj,] < - (= Pr 1) =F. 


iel 
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Lemma 8.4.1 gives an effective upper bound when yu < s!— basically ifs > pa 
for @ > e. For smaller s we look at the further condition of J being a maxdisfam. To 
that end we let 4, denote the minimum, over all j;,..., 7s € J of }> Pr[B;]}, the sum 
taken over all i € J except those 7 with: ~ 7; for some 1 <1 < s. In application s 
will be small (otherwise we use Lemma 8.4.1) and jz, will be close to 4. For some 
applications it is convenient to set 


= Pr |B; 
a 
inj 
and note that up, > pp — sv. 


Lemma 8.4.2 With the above notation and for any integer s, 


$s 
orbs eA/? 
s! 
s§ 
< Hehe /? : 
s! 


Pr [there exists a maxdisfam J, |J\ = s] 


IA 


Proof. As in Lemma 8.4.1 we bound this probability by 5>* of J = {j1,-..,js} 
being a maxdisfam. For this to occur J must first be a disfam and then A\* B;, where 
/(\* is the conjunction over all 1 € I except those with i ~ 7, for some 1 <1 < s. 
We apply Janson’s Inequality to give an upper bound to Pr [A* Bi] . The associated 
values j*, A* satisfy 


the latter since A* has simply fewer addends. Thus 
Pr [A* Bi] < eH eA/2 


and 


* : [ls A * 
S~*Pr [J maxdisfam] < e7#*e4/?5~*Pr \ B; 
jeJ 
“ts pA /2 He 
ah 


IA 


e€ 


When A = o(1) and vy = o(1) or, more generally, j43,, = ps + o(1), then 
Lemma 8.4.2 gives a close approximation to the Poisson distribution since 
s 
Pr [there exists a maxdisfam J, |J| = s] < (1+ o(1)) Se 
s 


for s < 3y and the probability is quite small for larger s by Lemma 8.4.1. 
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8.5 COUNTING EXTENSIONS 


We begin with a case that uses the basic large deviation results of Appendix A. 


Theorem 8.5.1 Set p = [(Inn)/nlw(n), where w(n) — 00 arbitrarily slowly. Then 
in G(n, p) almost always 


deg(x) ~ (n ~ 1)p 


for all vertices x. 
This is actually a large deviation result. It suffices to show the following. 


Theorem 8.5.2 Set p = [(Inn)/n]w(n), where w(n) - © arbitrarily slowly. Let 
xz € G be fixed. Fix € > 0. Then 


Pr [ldeg(x) ~ (n — 1)p| > e(n — 1p] = on). 


Proof. As deg(x) ~ B(n — 1,p), that is, it is a binomial random variable with the 
above parameters, we have from Corollary A.1.14 that 


Pr [|deg(x) — (n — 1)p| > e(n — 1)p] < 2e7%(*— YP = o(n71), 
as c, is fixed and (n — 1)p > Inn. | 


This result illustrates why logarithmic terms appear so often in the study of random 
graphs. We want every x to have a property, hence we try to get the failure probability 
down to o(n~!). When the Poisson Paradigm applies the failure probability is roughly 
an exponential, and hence we want the exponent to be logarithmic. This often leads 
to a logarithmic term for the edge probability p. 

In Section 8.3 we found the threshold function for every vertex to lie on a triangle. 
It basically occurred when the expected number of extensions of a given vertex to a 
triangle reached Inn. Now set N(z:) to be the number of triangles containing x. Set 
w= ("5')p® = E[N(z)]. 


Theorem 8.5.3 Let p be such that : >> Inn. Then almost always 
N(x) ~ ph 
forallx € G(n,p). 


As above, this is actually a large deviation result. We actually show the following. 


Theorem 8.5.4 Let p be such that p > \nn. Let x € G be fixed. Fix € > 0. Then 


Pr [|N(«) — pl > ef] = o(n™). 


Proof. We shall prove this under the further assumption p = n~2/3+°() (or equiv- 
alently, pp = n°)), which could be removed by technical methods. We now have, 
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in the notation of Lemmas 8.4.1 and 8.4.2, vy, A = o(1). Let P denote the Poisson 
distribution with mean y.. Then 


Pr [there exists a maxdisfam J, |J| < p(1 — €)] 


< (1+0(1))Pr[P <ul), 
Pr [there exists a maxdisfam J, j4(1 + €) < |J| < 3p] 
< (1+o0(1))Pr[uQt+e) < P< 3y], 


Pr [there exists a maxdisfam J, | J! > 3] 


< Pr [there exists a disfam J,|J| > 3u] < S EF = O((1-c)*), 
s! 
s=3p 


where c > 0 is an absolute constant. Since pz >> Inn the third term is o(n~1). The 
first and second terms are o(n—!) by Theorem A.1.15. With probability 1 — o(n~') 
every maxdisfam J has size between (1 — €)j: and (1 + €)u. 

Fix one such J. (There always is some maximal disfam — even if no B; held 
we could take J = 9.) The elements of J are triples xyz that form triangles, hence 
N(x) > |J| > (1 — €)p. The upper bound is ad hoc. The probability that there exist 
five triangles of the form xyz, ryz2, Tyz3, Fyza, Ly2s is at most n®p!! = o(n~!). 
The probability that there exist triangles ry; 2;,ry:z;, 1 < i < 4, all vertices distinct, 
is at most n!2p?° = o(n~!). Consider the graph whose vertices are the triangles 
xyz, with ~ giving the edge relation. There are N(x) vertices, the maxdisfam J 
are the maximal independent sets. In this graph, with probability 1 — o(n~'), each 
vertex xyz has degree at most nine and there is no set of four disjoint edges. This 
implies that for any J, |.J| > N(x) — 27 and 


N(z) < (1+ e)ut27< (1t+e)p. 


a 
For any graph H with “roots” 71,...,2, we can examine in G(n, p) the number 
of extensions N(r1,...,2,) of a given set of r vertices to a copy of H. In Spencer 


(1990a) some general results are given that generalize Theorems 8.5.2 and 8.5.4. 
Under fairly wide assumptions (see Exercise 5, Chapter 10), when the expected 
number yz of extensions satisfies jz >> In n then almost always all N(21,...,27) ~ U. 


8.6 COUNTING REPRESENTATIONS 


The results of this section shall use the following very basic and very useful result. 


Lemma 8.6.1 [The Borel-Cantelli Lemma] Let {A,,}, <n be events with 


> Pr {A,] < 00. 
n=1 


COUNTING REPRESENTATIONS 131 


Then 


Ralee 


i=) j=i 


That is, almost always A, is false for all sufficiently large n. In application we shall 
aim for Pr [Ay] < n~° with c > 1 in order to apply this lemma. 

Again we begin with a case that involves only the large deviation results of 
Appendix A. Fora given set S of natural numbers let (for every n € N) f(n) = fs(n) 
denote the number of representationsn =x+y,z2,yES,r<y. 


Theorem 8.6.2 [Erdos (1956)] There is a set S for which f(n) = O(Inn). That is, 
there is a set S and constants c1, C2 so that for all sufficiently large n 


ciinn< f(n) <colnn. 


Proof. Define S randomly by 
: Inx 
Pr [x € S] = pz = nf nz ; 
x 


Fix n. Now f(n) is a random variable with mean 


h= E[ [f(n)] yy PxrPy - 
zty=n 
rfy 
Roughly there are n addends with prpy > p2 = (100Inn)/n. We have pepe = 
O((Inn)/n) except in the regions x = o(n), y = o(n) and care must be taken that 
those terms don’t contribute significantly to 44. Careful asymptotics (and first year 
calculus!) yield 


pw 50Inn = 50alnn. 


i dx 

0 Vx(1— 2x) 
The negligible effect of the 2 = o(n), y = o(n) terms reflects the finiteness of the 
indefinite integral at poles x = 0 and x = 1. The possible representations x + y = n 
are mutually independent events so that from Corollary A.1.14, 


Pr [| f(n) — pl > eu] < 2e7%# 
for constants €,d = 6(€). To be specific we can take « = 0.9,d = 0.1 and 
Pr [|f(n) — u| > 0.9p} < Qe St Inn — yl 


for n sufficiently large. Take c, < 0.1(507) and cg > 1.9(507). 
Let A,, be the event that c; nn < f(n) < coInn does not hold. We have 
Pr[A,] <1" for n sufficiently large. The Borel—Cantelli Lemma applies, almost 
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always all A,, fail for n sufficiently large. Thus there exists a specific point in the 
probability space, that is, a specific set S, for which c; Inn < f(n) < cg Inn for all 
sufficiently large n. a 


The development of the infinite probability space used here, and below, has been 
carefully done in the book Sequences by H. Halberstam and K. F. Roth (1983). 

The use of the infinite probability space leaves a number of questions about the 
existential nature of the proof that go beyond the algorithmic. For example, does 
there exist a recursive set S having the property of Theorem 8.6.2? An affirmative 
answer is given in Kolountzakis (1999). 

Now for a given set S of natural numbers let g(n) = gs(n) denote the number of 
representations n = 1+y+2z,2,y,z € S,x <y < z. The following result was 
actually proved for representations of n as the sum of k terms for any fixed &. For 
simplicity we present here only the proof for k = 3. 


Theorem 8.6.3 [Erdos and Tetali (1990)] There is a set S for which g(n) = O(Inn). 
That is, there is a set S and constants c1, C2 so that for all sufficiently large n, 


c.lnn < g{(n) <cglnn. 


Proof. Define S randomly by 


Pr[z € S] = pz = min ¢ 10 ae ae 
= Pz = ae 9 ;: 


Fix n. Now g(n) is a random variable and 


w=Elg(r)]= S> pepype- 
zt+ytz=n 
Careful asymptotics give 
1 3 E l-—x 
p~ inn | / on ay a3 = Kinn, 
6 z=0 Jy=0 [ry(1-2—y)] c 


where K is large. (We may make K arbitrarily large by increasing “10.”) We apply 
Lemma 8.4.2. Here 


N=. pedyPepy Pe 


the sum over all five-tuples with 7+ y+z=2+y'+2' =n. Roughly there are 
n3 terms, each ~ p> = n—10/3+°() so that the sum is o(1). Again, care must be 
taken that those terms with one (or more) small variables don’t contribute much to 
the sum. We bound s < 3 = O(Inn) and consider jz,. This is the minimal possible 
> PrPypz over all those x, y, z with x + y+ z =n that do not intersect a given set 
of s representations; let us weaken that and say a given set of 3s elements. Again 
one needs that the weight of }> fytz—n PxPyPz is not on the edges but “spread” in 
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the center and one shows pz, ~ pt. Now, as in Section 8.5, let P denote the Poisson 
distibution with mean yz. The probability that there exists a maxdisfam J of size less 
than (1 — €) or between p(1 + €) and 3p is asymptotically the probability that P 
lies in that range. For moderate e, as K is large, these — as well as the probability 
of having a disfam of size bigger than 3. — will be o(n~°) with c > 1. By the 
Borel—Cantelli Lemma almost always all sufficiently large n will have all maxdisfam 
J of size between c; Inn and cz Inn. Then g(n) > c; Inn immediately. 

The upper bound is again ad hoc. With this p let f(n) be, as before, the number 
of representations of n as the sum of two elements of S.. We use only that p, = 
2/3400), We calculate 


BUC) = Ty (ay) 2H = VD, 


zty=n 


again watching the “pole” at 0. Here the possible representations are mutually 
independent so 


Pr [f(n) > 4] < E[f(n)]* /4) = 0-4/3) | 


and by the Borel—Cantelli Lemma almost always f(n) < 3 for all sufficiently large 
n. But then almost always there is a C' so that f(n) < C for all n. For all sufficiently 
large n there is a maxdisfam (with representations as the sum of three terms) of size 
less than cz Inn. Every triple x,y,z € S witha+y+z = n must contain at least one 
of these at most 3c2 Inn points. The number of triples z, y,z € Switha+y+z=n 
for a particular x is simply {(n — x), the number of representations n — x = y + z 
(possibly one less since y, z # x), and so is at most C’. But then there are at most 
C(3c2 In n) total representations n = x + y + z. | 


8.7 FURTHER INEQUALITIES 


Here we discuss some further results that allow one, sometimes, to apply the Poisson 
Paradigm. Let B;,i € J be events in an arbitrary probability space. As in the 
Lovasz Local Lemma of Chapter 5 we say that a symmetric binary relation ‘~’ on 
I is a dependency digraph if for each i € I the event B; is mutually independent of 
{B;:i% j}. [The digraph of Section 5.1 has FE = {(7,j) : i ~ j}.] Suppose the 
events B; satisfy the inequalities of Section 8.2: 


Pr |B; | /\ Bj| < Pr [Bil 
jed 


valid for all index sets J C I,i ¢ J and 


Pr |B; | Be A /\ Bj] < Pr [B: | Be] 
Jed 
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valid for all index sets J C J,i,k ¢ J. Then the Janson Inequalities in Theo- 
rems 8.1.1 and 8.1.2 and also Lemmas 8.4.1 and 8.4.2 hold as stated. The proofs are 
identical, the above are the only properties of the events B; that were used. 

Suen (1990) [see also Janson (1998) for significant variations] has given a very 
general result that allows the approximation of Pr [A,., Bi] by M = [],<, Pr [Bi]. 
Again let {B;}:er be events in an arbitrary probability space. We say that a binary 
relation ~ on I is a superdependency digraph if the following holds: Let Jj, Jg C I 
be disjoint subsets so that 7, ~ j2 for no j; € Ji, j2 € Jz. Let B! be any Boolean 
combination of the events {B;}j<7, and let B? be any Boolean combination of the 
events {B;}j<7,. Then B', B? are independent. Note that the ‘~’ of Section 8.1 is 
indeed a superdependency digraph. 


Theorem 8.7.1 [Suen] Under the above conditions, 


AB 


ie] 


Pr <M [eX ens vod — 1 ; 


where 


y(i,j) = (Pr[Bi A By] + Pr[Bi]Pr[Bj]) [[ (-Pr[Bij)?. 


Ini or inj 


We shall not prove Theorem 8.7.1. In many instances the above product is not 
large. Suppose it is less than two for all ¢ ~ 7. In that instance 


So yli,9) <2 | A+ S$) Pr [Bil Pr [By] 

inj ing 
In many instances }/;; Pr [Bi] Pr [B;] is small relative to A (as in many instances 
when i ~ 7 the events B;, B; are positively correlated). When, furthermore, A = 
o(1), Suen’s Theorem gives the approximation of Pr [A;<, Bi] by M. Suen has 
applied this result to examinations of the number of induced copies of a fixed graph 
H in the random G(n, p). 

Janson (1990) has given a one-way large deviation result on the X of Section 8.1 

that is somewhat simpler to apply than Lemmas 8.4.1 and 8.4.2. 


Theorem 8.7.2 [Janson] With 1 = E[X] and y > 0 arbitrary, 

Pr[X <(1—-y)u] < enV H/L2+(A/H)] 
When A = o(,:) this bound on the tail approximates that of the normal curve with 
mean and standard deviation 1. We shall not prove Theorem 8.7.2 here. The proofs of 


Theorems 8.7.1 and 8.7.2 as well as the original proofs by Janson of Theorems 8.1.1 
and 8.1.2 are based on estimations of the Laplace transform of X , bounding E fe“ ] : 
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1. Prove that for every € > 0 there is some no = no(e) so that for every n > no 
there is a graph on n vertices containing every graph on k < (2 —€)logyn 
vertices as an induced subgraph. 


2. Find a threshold function for the property: G(n,p) contains at least n/6 
pairwise vertex disjoint triangles. 


THE PROBABILISTIC LENS: 
Local Coloring 


This result of Erddés (1962) gives further probabilistic evidence that the chromatic 
number y(G) cannot be deduced from local considerations. 


Theorem 1 For all k there exists « > 0 so that for all sufficiently large n there exist 
graphs G on n vertices with x(G) > k and yet x(G|s) < 3 for every set S of vertices 
of size at most en. 


Proof. For a given k let c, € > 0 satisfy (with foresight) 


c > 2k?H(1/k)In2, 


ee ate ER oa 


where H(x) = —x log, x — (1— 2) log.(1 — 2) is the entropy function. Set p = c/n 
and let G ~ G(n,p). We show that G' almost surely satisfies the two conditions of 
the theorem. 

If x(G) < k there would be an independent set of size n/k. The expected number 
of such sets is 


n n/k 2 
( ( — p)f 5°) < gn(H(1/k)+0(1)) -—en/2k (L+0(1)) | 
uN 


which is o(1) by our condition on c. Hence almost surely y(G) > k. 

Suppose some set S with t < en vertices required at least four colors. Then as in 
the proof of Lemma 7.3.4 there would be a minimal such set S. For any v € S there 
would be a three-coloring of S — {v}. If v had two or fewer neighbors in S then this 
could be extended to a three-coloring of S. Hence every v € S would have degree 
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at least three in G|s5 and so G|s would have at least 3/2 edges. The probability that 
some t < en vertices have at least 3t/2 edges is less than 


SOG) 


We outline the analysis. When ¢ = O(1) the terms are negligible. Otherwise we 
bound each term from above by 


Es (“)" Gull < [es23-828/2 7a | 


t 3 n 


Now since t < en the bracketed term is at most €°/23—3/2¢3/2¢!/2, which is less 
than one by our condition on ¢. The full sum is 0(1); that is, almost surely no such 
S exists. z= 


Many tempting conjectures are easily disproved by the probabilistic method. If 
every n/ Inn vertices may be three-colored then can a graph G on n vertices be 
four-colored? This result shows that the answer is no. 
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Pseudorandomness 


‘A knot!’, said Alice, always ready to make herself useful, and looking anxiously 
about her. ‘Oh, do let me help to undo it!” 


— from Alice in Wonderland, by Lewis Carroll 


As shown in the various chapters of this book, the probabilistic method is a powerful 
tool for establishing the existence of combinatorial structures with certain properties. 
It is often the case that such an existence proof is not sufficient; we actually prefer 
an explicit construction. This is not only because an explicit construction may shed 
more light on the corresponding problem, but also because it often happens that a 
random-looking structure is useful for a certain algorithmic procedure; in this case 
we would like to have an algorithm and not merely to prove that it exists. 

The problem of finding explicit constructions may look trivial; after all, since we 
are mainly dealing with finite cases, once we have a probabilistic proof of existence 
we can find an explicit example by exhaustive search. Moreover, many of the 
probabilistic proofs of existence actually show that most members of a properly 
chosen random space have the desired properties. We may thus expect that it would 
not be too difficult to find one such member. Although this is true in principle, it is 
certainly not practical to check all possibilities; it is thus common to define an explicit 
construction of a combinatorial object as one that can be performed efficiently; say, 
in time that is polynomial in the parameters of the object. 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
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Let us illustrate this notion by one of the best known open problems in the area of 
explicit constructions, the problem of constructing explicit Ramsey graphs. The first 
example given in Chapter | is the proof of Erdés that for every n there are graphs on n 
vertices containing neither a clique nor an independent set on 2 logy n vertices. This 
proof is an existence proof; can we actually describe such graphs explicitly? Erdés 
offered a prize of $500 for the explicit construction of an infinite family of graphs, 
in which there is neither a clique nor an independent set of size more than a constant 
times the logarithm of the number of vertices, for some absolute constant. Of course, 
we can, in principle, for every fixed n, check all graphs on n vertices until we find a 
good one, but this does not give an efficient way of producing the desired graphs and 
hence is not explicit. Although the problem mentioned above received a considerable 
amount of attention, it is still open. The best known explicit construction is due to 
Frankl and Wilson (1981), who describe explicit graphs on n vertices which contain 
neither a clique nor an independent set on more than 2°V lognloglogn vertices, for 
some absolute positive constant c. 

Although the problem of constructing explicit Ramsey graphs is still open, there 
are several other problems for which explicit constructions are known. In this chapter 
we present a few examples and discuss briefly some of their algorithmic applications. 
We also describe several seemingly unrelated properties of a graph, which all turn out 
to be equivalent. All these are properties of the random graph and it is thus common 
to call a graph that satisfies these properties guasirandom. The equivalence of all 
these properties enables one to show, in several cases, that certain explicit graphs 
have many pseudorandom properties by merely showing that they possess one of 
them. 


9.1 THE QUADRATIC RESIDUE TOURNAMENTS 


Recall that a tournament on a set V of n players is an orientation T = (V, £) of the 
set of edges of the complete graph on the set of vertices V. If (x, y) is a directed edge 
we say that x beats y. Given a permutation 7 of the set of players, a (directed) edge 
(x,y) of the tournament is consistent with 7 if x precedes y in 7. If 7 is viewed as 
a ranking of the players, then it is reasonable to try and find rankings with as many 
consistent arcs as possible. Let c(a, 7’) denote the number of arcs of J which are 
consistent with 7, and define c(7’) = max c(z, T), where the maximum is taken over 
all permutations 7 of the set of vertices of T. For every tournament T on n players, if 
m=1,2,...,nand 7’ =n,n—1,...,1 then e(x,T) +c(n’,T) = (5). Therefore 
e(T) > 4(5). In fact, it can be shown that for every such T, c(T) > $ (5) +2(n3/?). 
On the other hand, a simple probabilistic argument shows that there are tournaments 
T on n players for which c(T) < (1 + 0(1))3(5). [The best known estimate, which 
gives the right order of magnitude for the largest possible value of the difference of 
c(T) — $(5) is more complicated and was given by de la Vega (1983), where he 
showed that there are tournaments T on n players for which e(T) < 4(3)+O(n*/?).] 

Can we describe explicitly tournaments T on n vertices in which c(T) < (1 + 
o(1))(5)? This problem was mentioned by Erdés and Moon (1965) and by Spencer 
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(1985a). It turns out that several such constructions can be given. Let us describe 
one, 

Letp =3 (mod 4) bea prime and let T’ = T, be the tournament whose vertices 
are all elements of the finite field GF'(p) in which (7, 7) is a directed edge iff i — 7 is 
a quadratic residue. [Since p = 3 (mod 4), —1 is a quadratic nonresidue modulo 
p and hence T;,, is a well-defined tournament. ] 


Theorem 9.1.1 For the tournaments T,, described above, 


1 
(Tp) S 5 (5) + O(p*/? log p) . 


In order to prove this theorem we need some preparations. Let y be the quadratic 
residue character defined on the elements of the finite field GF(p) by x(y) = 
y?—-1)/2, Equivalently, x(y) is 1 if y is a nonzero square, 0 if y is 0 and —1 
otherwise. Let D = (dih25 be the p by p matrix defined by dj; = x(t — J). 


Fact 1 For every two distinct j andl, cg r(p) Uijgdi = —1. 


Proof. 


didigda = YI xG~AxE-D = DE xE-DxlE- 0) 


a iFj,l 
tJ Jj 
=— = 1 
yo x(Si) = EO) 
ifj,l tAjt 


As i ranges over all elements of GF'(p) besides j and | the quantity (1 + (J — 
j)/(4—1) ranges over all elements of GF'(p) besides 0 and 1. Since the sum of y(r) 
over all r in GF‘(p) is 0 this implies that the right-hand side of the last equation is 
0 — x(0) — y(1) = —1, completing the proof of the fact. | 


For two subsets A and B of GF(p), let e(A, B) denote the number of directed 
edges of 7, that start in a vertex of A and end in a vertex of B. By the definition of 
the matrix D it follows that 


S_ >= diz = e(A, B) — e(B, A). 


icA EB 
The following lemma is proved in Alon (1986b). 


Lemma 9.1.2 For any two subsets A and B of GF(p), 


a So di; 4 |A|}/?|B]}/2p3/2 , 


i€ AEB 
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Proof. By the Cauchy—Schwarz Inequality and by the fact above, 


2 2 
Dota) 3S Ae om 
i€AjEB icA \jEB 
2 
an i eae a 


icGF(p) \jEB 


= |Al 4° [ [Bl +2 SO dada 
i€GF(p) j<leB 


= |AllBipt+ 2/4, D> SO dijda 


j<lEB i€GF(p) 
< |Al|Blp, 


completing the proof of the lemma. a 


Proof [Theorem 9.1.1]. Let r be the smallest integer satisfying 2” > p. Let 
7 = 1,..., 7p be an arbitrary permutation of the vertices of T,, and define x’ = 
Tp; ---,71. We must show that c(x,Tp) < 4(5) + O(p*/? log p) or, equivalently, 
that c(7,T) — c(z’, Ty) < O(p?/? log p). Let a; and az be two integers satisfying 
p = a, + ag and a, < 2"~!, ag < 27~!. Let A be the set of the first a, vertices in 
the permutation 7 and let Ag be the set of the last a2 vertices in 7. By Lemma 9.1.2, 


e( Aj, Az) = e(Ag, Ai) < (a,azp)!/? < yrds li* ‘ 


Next, let a11, @12, @21, G22 be integers each of which does not exceed 2"~? such that 
Q@1 = 411 + Qy2 and ag = G21 + agg. Let Aj; be the subset of A; consisting of 
those aj, elements of A, that appear first in 7, and let Aj2 be the set of the ay2 
remaining elements of A;. The partition of Az into the two sets Ao, and Ag is 
defined similarly. By applying Lemma 9.1.2 we obtain 


€(Aq1, Aiz) — e(Ai2, Ai) + e(Aa1, Azz) — e( Aga, Aor) 
<  (ayyayop)!/? + (az1a22p)1/? 
< 2 o Pi rae . 


Continuing in the same manner we obtain, in the ith step, a partition of the set of 
vertices into 2* blocks, each consisting of at most 2”~* consecutive elements in the 
permutation 7. This partition is obtained by splitting each block in the partition 
corresponding to the previous step into two parts. By applying Lemma 9.1.2 to each 
such pair A.1, Aco (where here € is a vector of length i — 1 with {1, 2}-entries), and 
by summing we conclude that the sum over all these 2*~! vectors € of the differences 
e(Agi, Aco) — €(Aceg, Aci) does not exceed 


olor ty /2 < gr lpi /2 : 
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Observe that the sum of the left-hand sides of all these inequalities as i ranges from 
1 to r is precisely the difference c(,T,) — c(x’,T,). Therefore by summing we 
obtain 


c(t, T >) — e(’, Tp) < 2° p'/?r = O(p?/? log p), 


completing the proof. a 


We note that any antisymmetric matrix with {1,—1}-entries in which each two 
columns are roughly orthogonal can be used to give a construction of a tournament 
as above. Some related results appear in Frankl, R6édl and Wilson (1988). The 
tournaments T;,, however, have stronger pseudorandom properties than do some of 
these other tournaments. For example, for every k < + log p, and for every set S 
of k vertices of T;,, the number of vertices of 7, that beat all the members of S is 
(1 + 0(1))p/2*. This was proved by Graham and Spencer (1971) by applying Weil’s 
famous theorem known as the Riemann hypotheses for curves over finite fields [Weil 
(1948)]. Taking a sufficiently large p this supplies an explicit construction for the 
Schiitte problem mentioned in Chapter 1. 


9.2 EIGENVALUES AND EXPANDERS 


A graph G = (V, E) is called an (n, d, c)-expander if it has n vertices, the maximum 
degree of a vertex is d, and for every set of vertices W C V of cardinality |W| < n/2, 
the inequality |N(W)| > c|W| holds, where N(W) denotes the set of all vertices in 
V \ W adjacent to some vertex in W. We note that sometimes a slightly different 
definition is used, but the difference is not essential. Expanders share many of the 
properties of sparse random graphs and are the subject of an extensive literature. A 
family of linear expanders of density d and expansion c is a sequence {G;}°°,, where 
G;; is an (n;, d, c)-expander and 7, tends to infinity as 2 tends to infinity. 

Such a family is the main component of the parallel sorting network of Ajtai, 
Komlés and Szemerédi (1983) and can be used for constructing certain fault tolerant 
linear arrays. It also forms the basic building block used in the construction of graphs 
with special connectivity properties and small number of edges. Some other examples 
of the numerous applications of these graphs to various problems in theoretical 
computer science can be found, for example, in Alon (1986b) and its references. 

It is not too difficult to prove the existence of a family of linear expanders us- 
ing probabilistic arguments. This was first done by Pinsker (1973). An explicit 
construction is much more difficult to find and was first given by Margulis (1973). 
This construction was later improved by various authors; most known constructions 
are Cayley graphs of certain groups of matrices, and their expansion properties are 
proved by estimating the eigenvalues of the adjacency matrices of the graphs and 
by relying on the close correspondence between the expansion properties of a graph 
and its spectral properties. This correspondence was first studied, independently, by 
Tanner (1984) and by Alon and Milman (1984). Since it is somewhat simpler for the 
case of regular graphs we restrict our attention here to this case. 
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Let G = (V,E) be a d-regular graph and let A = Ag = (Guy)uvev be its 
adjacency matrix given by a,, = 1 if uv € E and a,, = 0 otherwise. Since G 
is d-regular the largest eigenvalue of A is d, corresponding to the all 1 eigenvector. 
Let \ = X(G) denote the second largest eigenvalue of G. For two (not necessarily 
disjoint) subsets B and C of V let e(B, C’)) denote the number of ordered pairs (u,v), 
where u € B, v € Cand uv is an edge of G. (Note that if B and C are disjoint this 
is simply the number of edges of G that connect a vertex of B with a vertex of C.) 


Theorem 9.2.1 For every partition of the set of vertices V into two disjoint subsets 
BandC, 
(d= )/BIIC|. 


Tr 


e(B,C) > 


Proof. Put |V| = n, b = |B| andc = |C| = n — b. Let D = dI be the n by n scalar 
matrix with the degree of regularity of G on its diagonal. Observe that for any real 
vector x of length n (considered as a function x : V — R) we have 


((D-A)z,z) = SO Come ye “)x) 
ueV viuveE 
= d Sr (alu)? =2 YZ a(v)e(u) = YP (eo) - 2(w))?. 
uEeV uve uve 


Define, now, a vector z by x(v) = —cifv € Band x(v) = bifv € C. Note that A 
and D — A have the same eigenvectors and that the eigenvalues of D — A are precisely 
d — 4, as ps ranges over all eigenvalues of A. Note, also, that }°,,-y 2(v) = 0; that 
is, x is orthogonal to the constant vector, which is the eigenvector of the smallest 
eigenvalue of D — A. Since D — A is a symmetric matrix, its eigenvectors are 
orthogonal to each other and form a basis of the n-dimensional space. It follows 
that x is a linear combination of the other eigenvectors of D — A and hence, by the 
definition of and the fact that d — is the.second smallest eigenvalue of D — A, 
we conclude that 


((D — A)x, x) > (d— d) (a, 2) = (d — d) (bc? + cb?) = (d — d)ben. 


By the second paragraph of the proof the left-hand side of the last inequality is 
euven(t(u) — x(v))? = e(B,C)- (b+ c)? = e(B,C) + n?. Thus 


e(B,C) > (aealne 


completing the proof. a 


Corollary 9.2.2 [f » is the second largest eigenvalue of a d-regular graph G withn 
vertices, then G is an (n, d, c)-expander for c = (d — )/2d. 


Proof. Let W be a set of w < n/2 vertices of G. By Theorem 9.2.1 there are at 
least (d — A)w(n — w)/n > (d ~ A)w/2 edges from W to its complement. Since 
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no vertex in the complement is adjacent to more than d of these edges it follows that 
|N(W)| > (d — A)w/2d. 5 


The estimate for cin the last corollary can in fact be improved to 2(d—2)/(3d—2,), 
as shown by Alon and Milman (1984). Each of these estimates shows that if the second 
largest eigenvalue of G is far from the first, then G' is a good expander. The converse 
of this is also true, although more complicated. This is given in the following result, 
proved in Alon (1986a), which we state without its proof. 


Theorem 9.2.3 If G is a d-regular graph which is an (n, d, c)-expander then 


c2 

MG) <d—- oe 

The last two results supply an efficient algorithm for approximating the expanding 
properties of a d-regular graph; we simply compute (or estimate) its second largest 
eigenvalue. The larger the difference between this eigenvalue and d is, the better 
expanding properties of G follow. It is thus natural to ask how far from d this second 
eigenvalue can be. It is known [see Nilli (1991)] that the second largest eigenvalue 
of any d-regular graph with diameter k is at least 2\/d — 1(1 — O(1/k)). Therefore, 
in any infinite family of d-regular graphs, the limsup of the second largest eigenvalue 
is at least 2/d—1. Lubotzky, Phillips and Sarnak (1986), and independently, 
Margulis (1988), gave, for every d = p+ 1 where p is a prime congruent to 1 
modulo 4, explicit constructions of infinite families of d-regular graphs G; with 
second largest eigenvalues \(G;) < 2d —1. These graphs are Cayley graphs of 
factor groups of the group of all two by two invertible matrices over a finite field, and 
their eigenvalues are estimated by applying results of Eichler and Igusa concerning 
the Ramanujan Conjecture. Eichler’s proof relies on Weil’s Theorem mentioned in 
the previous section. The nonbipartite graphs G constructed in this manner satisfy 
a somewhat stronger assertion than A(G) < 2/d — 1. In fact, besides their largest 
eigenvalue d, they do not have eigenvalues whose absolute value exceed 2d — 1. 
This fact implies some strong pseudorandom properties, as shown in the next results. 


Theorem 9.2.4 Let G = (V, E) be a d-regular graph on n vertices, and suppose the 
absolute value of each of its eigenvalues but the first one is at most . For a vertex 
v € V anda subset B of V denote by N(v) the set of all neighbors of v in G, and let 
Np(v) = N(v) OB denote the set of all neighbors of v in B. Then, for every subset 
B of cardinality bn of V, 


S"(\Na(v)| — bd)? < 7b(1 — b)n. 
vEeV 


Observe that in a random d-regular graph each vertex v would tend to have about 
bd neighbors in each set of size bn. The above theorem shows that if A is much 
smaller than d then for most vertices v, Ng(v) is not too far from bd. 


Proof. Let A be the adjacency matrix of G and define a vector f : V — R by 
f(v) =1—bforv € Band f(v) = —bforv ¢ B. Clearly °,<y f(v) = 0; that is, 
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f is orthogonal to the eigenvector of the largest eigenvalue of A. Therefore 


(Af, Af) <7 (ff) - 


The right-hand side of the last inequality is \7(bn(1—b)? + (1—b)nb?) = \?b(1—b)n. 
The left-hand side is 


2 (1 = 6)[Na(e)! — b(d — |Na(v)1))? = D0 (Na(v)| — bd)? 


vEeV vEV 


The desired result follows. a 
Corollary 9.2.5 Let G = (V, E),d,n and X be as in Theorem 9.2.4. Then for every 
two sets of vertices B and C of G, where |B| = bn and |C| = cn, we have 


|e(B, C) — cbdn| < AVben. 


Proof. By Theorem 9.2.4, 
do (INa(v)| — bd)? < $7 (\Na(v)| — bd)? < 761 — b)n. 
vec vEeV 


Thus, by the Cauchy—Schwarz Inequality, 


1/2 
le(B,C) — cbdn| < Note) ~ tt a (vote ~ 40? 


ve vec 


Vendv/b(1 — b)n < AVben. 


IA 


The special case B = C gives the following result. A slightly stronger estimate 
is proved in a similar way in Alon and Chung (1988). 


Corollary 9.2.6 Let G = (V, E),d,n and be as in Theorem 9.2.4. Let B be an 
arbitrary set of bn vertices of G and let e(B) = 5e(B, B) be the number of edges in 
the induced subgraph of G on B. Then 


1 1 
e(B) — 50 dn < 5Abn. 


A walk of length | in a graph G is a sequence vo,...,v; of vertices of G, where 
for each 1 <i < 1, 4-10; is an edge of G. Obviously, the total number of walks of 
length / in a d-regular graph on n vertices is precisely n - d'. Suppose, now, that C 
is a subset of, say, n/2 vertices of G. How many of these walks do not contain any 
vertex of C'? If G is disconnected it may happen that half of these walks avoid C’. 
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However, as shown byAjtai, Komldés and Szemerédi (1987), there are many fewer 
such walks if all the eigenvalues of G but the largest are small. This result and some 
of its extensions have several applications in theoretical computer science, as shown 
in the above-mentioned paper [see also Cohen and Wigderson (1989)]. We conclude 
this section by stating and proving the result and one of its applications. 


Theorem 9.2.7 Let G = (V, E) be a d-regular graph on n vertices, and suppose 
that each of its eigenvalues but the first one is at most . Let C be a set of cn vertices 
of G. Then, for every l, the number of walks of length | in G that avoid C does not 
exceed (1 —c)n((1 — c)d + ed))!. 


Proof. Let A be the adjacency matrix of G and let A’ be the adjacency matrix of its 
induced subgraph on the complement of C’. We claim that the maximum eigenvalue 
of A’ is at most (1 — c)d + cX. To prove this claim we must show that for every 
vector f : V — R satisfying f(v) = 0 for each v € C and SO -y f(v)? = 1, the 
inequality (Af, f) < (1—c)d+cA holds. Let fi, fo,..., f, be an orthonormal basis 
of eigenvectors of A, where f; is the eigenvector of A;, A, = d and each entry of f; 
is 1/./n. Then f = S77, cc fi, where S77, c? = 1 and 


. 2 Pf 5 fo) 
i ee a: 
1/2 


> fv)? (a _ ont) Stee. 


vEV\C 


IA 


where here we used the Cauchy-Schwarz Inequality. Therefore 5“, c? > c and 


i=2°4 
n 


(Af, f) => Gd < (1-e)d tea, 


i=l 


supplying the desired estimate for the largest eigenvalue of A’. 

Let ¥, > y2 > ++: > Ym be the eigenvalues of A’, where m = (1 — c)n. By 
the Perron—Frobenius Theorem it follows that the absolute value of each of them is 
at most 7, < (1 —c)d+ cA. The total number of walks of length | that avoid C 
is precisely (A”g, 9), where g is the all 1-vector indexed by the vertices in V — C. 
By expressing g as a linear combination of the eigenvectors of A’, g = 4 igi 
where g; is the eigenvector of y;, we conclude that this number is precisely 


m m 
Seat <7} S20? = my} < m((1- cd + 0)!. 
i=l i=1 


Substituting m = (1 — c)n the desired result follows. a 


A randomly chosen walk of length | in a graph G is a walk of length / in G chosen 
according to a uniform distribution among all walks of that length. Note that if G is 
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d-regular such a walk can be chosen by choosing randomly its starting point vo, and 
then by choosing, for each 1 < 7 < l, v; randomly among the d neighbors of v;_1. 


Corollary 9.2.8 LetG = (V, E),d,n, A, Cand c be as in Theorem 9.2.7 and suppose 


d 
l-—c)\d+cA < —. 
(1 —¢) SR 
Then, for every |, the probability that a randomly chosen walk of length | in G avoids 
C is at most 27"/?, 


Proof. The number of walks of length / in G that avoid C is at most (1 — c)n(({1 — 
c)d + cr)! < nd!2-"/2, by Theorem 9.2.7. Since the total number of walks is nd!, 
the desired result follows. a 


The results above are useful for amplification of probabilities in randomized 
algorithms. Although such an amplification can be achieved for any Monte Carlo 
algorithm, we prefer, for simplicity, to consider one representative example: the 
primality testing algorithm of Rabin (1980). 

For an odd integer q, define two integers a and b by g — 1 = 27d, where b is odd. 
An integer z, 1 < x < q — 1, is called a witness (for the nonprimality of q) if for the 
sequence X,...,2%q defined by x9 = 2° (mod gq) and z; = x?7_, (mod q) for 
1 <i < aeither zg # 1 or there is ani such that 7; # —1,1 and 7,4; = 1. One 
can show that if q is a prime then there are no such witnesses for g, whereas if q is 
an odd nonprime then at least half of the numbers between 1 and q — | are witnesses 
for q. (In fact, at least 3/4 are witnesses, as shown by Rabin.) This suggests the 
following randomized algorithm for testing if an odd integer g is a prime (for even 
integers there is a simpler algorithm!). 

Choose, randomly, an integer x between 1 and g — 1 and check if it is a witness. 
If it is, report that g is not a prime. Otherwise, report that g is a prime. 

Observe that if g is a prime, the algorithm certainly reports it is a prime, whereas 
if g is not a prime, the probability that the algorithm makes a mistake and reports it 
as a prime is at most 1/2. What if we wish to reduce the probability of making such 
a mistake? Clearly we can simply repeat the algorithm. If we repeat it / independent 
times, then the probability of making an error (i.e., reporting a nonprime as a prime) 
decreases to 1/2'. However, the number of random bits required for this procedure 
is 1 - log(g — 1). 

Suppose we wish to use fewer random bits. By applying the properties of a 
randomly chosen walk on an appropriate graph, proved in the last two results, we can 
obtain the same estimate for the error probability by using only log(g — 1) + O(2) 
random bits. This is done as follows. 

Let G be a d-regular graph with q — 1 vertices, labeled by all integers between 1 
and gq — 1. Suppose G has no eigenvalue, but the first one, that exceeds A and suppose 


that ak q 
we 2 


; (9.1) 


a 
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Now choose randomly a walk of length 2/ in the graph G, and check, for each of the 
numbers labeling its vertices, if it is a witness. If g is a nonprime, then at least half of 
the vertices of G are labeled by witnesses. Hence, by Corollary 9.2.8 and by (9.1), 
the probability that no witness is on the walk is at most 2~2//2 = 2-' Thus we 
obtain the same reduction in the error-probability as the one obtained by choosing 
| independent witnesses. Let us estimate the number of random bits required for 
choosing such a random walk. 

The known constructions of expanders given by Lubotzky et al. (1986) or by 
Margulis (1988) give explicit families of graphs with degree d and with \ < 2d — 1, 
for each d = p+ 1, where p is a prime congruent to 1 modulo 4. [We note that these 
graphs will not have exactly g — 1 vertices but this does not cause any real problem 
as we can take a graph with n vertices, where g — 1 < n < (1 + 0(1))(q — 1), and 
label its ith vertex by i (mod gq — 1). In this case the number of vertices labeled 
by witnesses would still be at least (5 + 0(1))n.] One can easily check that, for 
example, d = 30 and \ = 21/29 satisfy (9.1) and thus we can use a 30-regular graph. 
The number of random bits required for choosing a random walk of length 2/ in it is 
less than log(g — 1) + 10/ + 1, much less than the / log(q — 1) bits that are needed 
in the repetition procedure. 


9.3 QUASIRANDOM GRAPHS 


In this section we describe several pseudorandom properties of graphs, which, some- 
what surprisingly, turn out to be all equivalent. All the properties are ones satisfied, 
almost surely, by a random graph in which every edge is chosen, independently, with 
probability 1/2. The equivalence between some of these properties was first proved 
by several authors; see Thomason (1987), Frankl et al. (1988) and Alon and Chung 
(1988), but the first paper in which all of them (and some others) appear is the one 
by Chung, Graham and Wilson (1989). Our presentation here follows that paper, 
although, in order to simplify the presentation, we consider only the case of regular 
graphs. 

We first need some notation. For two graphs G and H, let N4(H) be the number 
of labeled occurrences of H as an induced subgraph of G; that is, the number of 
adjacency preserving injections f : V(H) — V(G) whose image is the set of 
vertices of an induced copy of H in G. Similarly, N¢(H) denotes the number 
of labeled copies oe Hf as a (not necessarily induced) subgraph of G. Note that 
Ne(A) = >>, NGL), where L ranges over all graphs on the set of vertices of H 
obtained from H ‘oh aie to it a (possibly empty) set of edges. 

Throughout this section G always denotes a graph with n vertices. We denote the 
eigenvalues of its adjacency matrix (taken with multiplicities) by \,,...,An, where 
jAi] > --- > |An|. [Since we consider in this section only the eigenvalues of G 
we simply write Ay and not A1(G).] Recall also the following notation, used in the 
previous section: for a vertex v of G, N(v) denotes the set of its neighbors in G. If 
S is a set of vertices of G, e(S) denotes the number of edges in the induced subgraph 
of G on S. If B and C are two (not necessarily disjoint) subsets of vertices of G, 
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e(B,C) denotes the number of ordered pairs (b,c), where b € B, c € C and bc is an 
edge of G. Thus e(S'}) = $e(S, S). 

We can now state the pseudorandom properties considered here. All the properties 
refer to a graph G = (V, E) with n vertices. Throughout the section, we use the o(-)- 
notation, without mentioning the precise behavior of each o(-). Thus occurrences of 
two o(1), say, need not mean that both are identical and only mean that if we consider 
a family of graphs G and let their number of vertices n tend to infinity then each o(1) 
tends to 0. 


Property P;(s): For every graph H(s) on s vertices 
NG(H(s)) = (1+ 0(1))n82-@) 


Property Pj: For the cycle C(4) with 4 vertices Ng(C(4)) < (1 + o(1))(n/2)*. 
Property P3: |2| = o(n). 

Property P4: For every set S of vertices of G e(S) = $|S|? + 0(n?). 

Property P;: For every two sets of vertices B and C' e(B,C) = $|B||C| + o(n”). 
Property Ps: >, yey | |N(u) A N(v)| ~ 0/4] = o(n’). 


It is easy to check that all the properties above are satisfied, almost surely, by a 
random graph on n vertices. In this section we show that all these properties are 
equivalent for a regular graph with n vertices and degree of regularity about n/2. 
The fact that the innocent-looking property P, is strong enough to imply for such 
graphs P,(s) for every s > 1 is one of the interesting special cases of this result. 

Graphs that satisfy any (and thus all) of the properties above are called quasiran- 
dom. As noted above the assumption that G is regular can be dropped (at the expense 
of slightly modifying property P. and slightly complicating the proofs). 


Theorem 9.3.1 Let G be a d-regular graph on n vertices, where d = (5 + o(1))n. 
If G satisfies any one of the seven properties P,(4),P,(s) forall s > 1, P, Ps, 
P4, Ps, Ps then it satisfies all seven. 


Proof. We show that 


P,(4) > Py, => P; > Py => Ps => Ps => P,(s) foralls >1 (=> P,(4)). 


1. P, (4) > Pp. 
Suppose G satisfies P, (4). Then N¢(C(4)) = 30, N&(L), as L ranges over 
the four labeled graphs obtained from a labeled C’(4) by adding to it a (possibly 
empty) set of edges. Since G satisfies P; (4), N6(L) = (1 + 0(1))n427!° for 
each of these graphs L and hence Ng(C(4)) = (1 + o(1))n42~4, showing 
that G satisfies P>. 


2. P2 > P3, 
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Suppose G satisfies Pz and let A be its adjacency matrix. The trace of A4 
is precisely }~\_, Ad. On the other hand it is easy to see that this trace is 
precisely the number of (labeled) closed walks of length 4 in G, that is, the 
number of sequences v9, 01, V2, U3, V4 = Uo Of vertices of G such that v;v;44 
is an edge for each 0 < i < 3. This number is N¢((C'(4)) plus the number 
of such sequences in which v2 = vo, which is nd?, plus the number of such 
sequences in which ve # vp and v3 = v1, which is nd(d — 1). Thus 


SS + boat (1 + o(1))(n/2)* a Soa 


i=l i=2 
= No(O(d i) + O(n*) = (1 + o(1))(n/2)*. 
It follows that >", A? = o(n*) and hence that |A2| = o(n), as needed. 
. Pz => Py. 
This is an immediate consequence of Corollary 9.2.6. 
. Py => Ps. 


Suppose G satisfies P,. We first claim that it satisfies property P; for disjoint 
sets of vertices B and C. Indeed, if B and C are disjoint then 


e(B,C) = e(BUC)—-e(B)-e(C) 
1 1 1 
= (iBl+ |C\)’ - q/B) = zler’ + o(n”) 
= SIBIIC|+0(n?), 


proving the claim. 


In case B and C are not disjoint we have 

e(B,C) = e(B\C"/,C\ B)+e(BNC,C\ B)+e(BNC, B\C)+2e(BNC). 
Put |B| = 6, |C| = cand |BMC!| = «. By the above expression for e( B,C) 
and by the fact that G satisfies P, and P; for disjoint B and C' we get 

56 <¢)(e= 2) + ante —2)+ 5a(b —2)+ 2? + o(n?) 
= Fhe+o(n”) = SIBIIC| + o(n?), 


lI 


e(B,C) 


showing that G satisfies Ps. 

: Ps => Ps. 

Suppose that G satisfies P; and recall that G' is d-regular, where d = (3 + 
o(1))n. Let v be a fixed vertex of G, and let us estimate the sum 


| 1N(u) A. N(v)| — FI . 
ue V,ux~v 
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Define a 
i= {ue Viu#év: |N(u)NN(v)| > a} 
and similarly 
Bz = {ue Vju#u: |N(unN(v)| < =} ; 


Let C be the set of all neighbors of v in G. Observe that 


[mM nNe@-F] = Ow) Ne) - BS 


ue€B, u€By, 


z e(Bi,C) — |Bil5 


i 


Since G satisfies P;, and since d = (4 + o(1))n the last difference is $|By|d+ 
o(n?) — |B, \n/4 = o(n?). 


A similar argument implies that 
> | IM) A.N()|- F] = o(n?). 
u€ Bo 
It follows that for every vertex v of G, 
> [INNO - FZ] =o), 
uEV,ufxu 


and by summing over all vertices v we conclude that G satisfies property Ps. 


. Ps => P,(s) forall s > 1. 


Suppose G = (V, £) satisfies Ps. For any two distinct vertices u and v of G 
let a(u,v) be 1 if uv € E and 0 otherwise. Also, define s(u,v) = |{w EV: 
a(u,w) = a(v, w)}|. Since G is d = ($ + o(1))n-regular, 


s(u,v) = 2|N(u) NM N(v)| +n — 2d = 2|N(u) MN N(v)| + o(n). 
Therefore the fact that G' satisfies Pg implies that 
»s | s(u, uv) — ; | = o(n*). (9.2) 
ujvEeVv 


Let H = H(s) be an arbitrary fixed graph on s vertices, and put VN, = 
NG(H(s)). We must show that 


N, = (1+ 0(1))n82-@). 


Denote the vertex set of H(s) by {v,,...,us}. For each 1 < r < s, put 
V, = {v1,...,v,}, and let H(r) be the induced subgraph of H on V,. We 
prove, by induction on r, that for N,. = Né(H(r)), 


N, = (1+ 0(1))ngy27@) , (9.3) 
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where n(,) = n(n —1)---(n~r +1). 


This is trivial for r = 1. Assuming it holds for r, where 1 < r < s, we prove 
it for r + 1. For a vector a = (a1,...,a,) of distinct vertices of G, and for a 
vector € = (€1,...,€,) of (0, 1)-entries, define 


fr(a,e) = |{uEeV:uA#q,...a, and a(v,a;) = €; foralll <7 <r}. 


Clearly N;1 is the sum of the N,. quantities f,.(a, €) in which e; = a(v,41, v;) 
and a ranges over all N,. induced copies of H(r) in G. 


Observe that altogether there are precisely 7(,)2” quantities f,(a,€). It is 
convenient to view f,(a@,¢) as a random variable defined on a sample space 
of 7(,)2” points, each having an equal probability. To complete the proof we 
compute the expectation and the variance of this random variable. We show 
that the variance is so small that most of the quantities f,(a, €) are very close 
to the expectation, and thus obtain a sufficiently accurate estimate for N44 
which is the sum of NV, such quantities. 


We start with the simple computation of the expectation E [f,] of f(a, €). We 
have 


1 1] 
Eff,] = Tgp Da Felons) = Foe DL Pele) 


1 n-?T 


where we used the fact that every vertex v # a1,...,@, defines € uniquely. 


Next, we estimate the quantity S, defined by 
S, = S> f(a e(fr(a,€) oa 1) : 


We claim that 
S, = S> s(u,v)(r)- (9.4) 
Up~V 
To prove this claim, observe that 5S, can be interpreted as the number of ordered 
triples (a, €, (u,v)), where @ = (a1,...,@,) is an ordered set of r distinct 
vertices of G, € = (€1,...,€,) is a binary vector of length r, and u,v is an 
ordered pair of additional vertices of G so that 


a(u,a,) = a(v,a~) =e, forall <k <r. 


For each fixed a and ¢, there are precisely f(a, €)(f,-(a@, €) — 1) choices for 
the pair (u,v) and hence S,. counts the number of these triples. 


Now, let us compute this number by first choosing u and v. Once u,v are 
chosen, the additional vertices a;,..., a, must all belong to the set {w EV: 
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a(u,w) = a(v,w)}. Since the cardinality of this set is s(u, v) it follows that 
there are s(u,v)(-) choices for a1,...a,. Once these are chosen the vector € 
is determined and thus (9.4) follows. 


We next claim that (9.2) implies 


Dey s(u,v)(r) = (1+ 0(1))n"t727". (9.5) 
uxu 


To prove this claim define €,, = s(u,v) — n/2. Observe that, by (9.2), 
ud leu] = o(n3) and |éyy| < n/2 < n for each u,v. Hence, for every 


fixed a > 1, 
Slew? StS. lel orn): 
UxvU ux~v 

This implies that 


S° s(u, v)(r) 


uxu 


= > G a ew) 


uxv 


Tr n k 
= So » Ck (5) woe (for appropriate constants c,) 
k=0 ufxvu 


= (8)'ne+ oe (%) ast 


k=0 uu 


r=] 
n\? fs 
(5) nay + 9) > Iexlr*lewo| e 


k=0 uv 


IA 


r-l 
perry? ak es nk > lea (for an appropriate constant c) 
k=0 uUZ~v 


IA 


r-1 

< nr t2g-r 4 cyon* . O(n” +?) 
k=0 

= n2-"(1+0(1)), 


implying (9.5). 
By (9.4) and (9.5), S;, = (1 + o(1))n™+?27". Therefore 


So (fr(a,€) — E[frl)? 
= S flae)-S EL? , 
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So (f2(a,€) — frase) + 9 fr(a,e) — mr 2"(n — r)22-7" 


Sy + (7) 2"E [fr] ~ 12" (n — r)?2- 7" 


Sr + Mr41) — Mr)2"(n— r)?2-2" = o(nt?). 


Recall that N41 is the summation of N,. quantities of the form f,.(a, €). Thus 


2 
Nia ~ NE(FIP =! D0 (f(a) —Elfrl) 
N,. terms 
By Cauchy—Schwarz, the last expression is at most 


Ne >> (fase) - Elf)? < Nr So(f-(0,€) — Elfs])? 


NN, terms 


II 


Nev o(n'**) = o(n??*) 


It follows that 
[Nes — N-ELfel | = o(n"*), 


and hence, by the induction hypothesis, 
Neat = N-E[f,| + 0(n™?) 
= (1+0(1))ng@ 27) - (n= r)2-" + o(n?*!) 
(1+ o(1))mrzy2 CF), 


This completes the proof of the induction step and establishes Theorem 9.3.1. 


There are many examples of families of quasirandom graphs. The most widely 
used is probably the family of Paley graphs G, defined as follows. For a prime 
p congruent to 1 modulo 4, let G, be the graph whose vertices are the integers 
0,1,2,...,p — 1 in which 2 and 7 are adjacent if and only if 2 — 7 is a quadratic 
residue modulo p. The graphs G,,, which are the undirected analogues of the quadratic 
residue tournaments discussed in Section 9.1, are (p — 1)/2-regular. For any two 
distinct vertices 7 and j of Gp, the number of vertices & that are either adjacent to 
both 7 and 7 or nonadjacent to both is precisely the number of times the quotient 
(k — t)/(k — j) is a quadratic residue. As k ranges over all numbers between 0 and 
p — 1 but 2 and j, this quotient ranges over all numbers but 1 and 0 and hence it is a 
quadratic residue precisely 5(p— 1) — 1 times. (This is essentially the same assertion 
as that of the first fact given in the proof of Theorem 9.1.1.) We have thus shown 
that for every two vertices 7 and j of Gp, s(i,j) = (p — 3)/2, and this, together 
with the fact that G, is (p — 1)/2-regular, easily implies that it satisfies Property Pe. 
Therefore it is quasirandom. As is the case with the quadratic residue tournaments, 
G, satisfies, in fact, some stronger pseudorandom properties that are not satisfied by 
every quasirandom graph and that can be proved by applying Weil’s Theorem. 
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PSEUDORANDOMNESS 


EXERCISES 


. By considering a random bipartite three-regular graph on 27n vertices obtained 


by picking three random permutations between the two color classes, prove 
that there is ac > 0 such that for every n there exists a (2n, 3, c)-expander. 


. Let G = (V,E) be an (n,d, A)-graph, suppose n is divisible by k, and let 


C:V — {1,2,...,k} be a coloring of V by k colors, so that each color 
appears precisely n/& times. Prove that there is a vertex of G which has a 
neighbor of each of the & colors, provided kA < d. 


. Let G = (V, E) bea graph in which there is at least one edge between any two 


disjoint sets of size a+ 1. Prove that for every set Y of 5a vertices, there is a set 
X of at most a vertices, such that for every set Z satisfying ZN (X UY) = 0 
and |Z| < a, the inequality |N(Z) 0 Y| > 2|Z| holds. 


. Prove that for every « > 0 there exists an no = no(e) so that for every 


(n,n/2,2./n)- graph G = (V, EF) with n > no, the number of triangles M in 
G satisfies |M — n°/48| < en’. 


THE PROBABILISTIC LENS: 
Random Walks 


A vertex-transitive graph is a graph G = (V,E) such that for any two vertices 
u,u € V there is an automorphism of G that maps u into v. A random walk of 
length / in G starting at a vertex v is a randomly chosen sequence v = vo, U1,--., U1, 
where each v;1 is chosen, randomly and independently, among the neighbors of 1; 
(O<i<l). 

The following theorem states that for every vertex-transitive graph G, the proba- 
bility that a random walk of even length in G ends at its starting point is at least as 
big as the probability that it ends at any other vertex. Note that the proof requires 
almost no computation. We note also that the result does not hold for general regular 
graphs, and the vertex transitivity assumption is necessary. 


Theorem 1 Let G = (V, FE) be a vertex-transitive graph. For an integer k and for 
two (not necessarily distinct) vertices u,v of G, let P® (u,v) denote the probability 
that a random walk of length k starting at u ends at v. Then, for every integer k and 


for every two vertices u,v € V, 


P?¥(u,u) > P?* (u,v). 


Proof. We need the following simple inequality, sometimes attributed to Chebyshev. 


Claim 9.4.1 For every sequence (a,...,Qn) of n reals and for any permutation 7 


Of {hgeony nh 
AjAn(i) < se a? . 
w=1 t=1 
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Proof. The inequality follows immediately from the fact that 


Ya? — Saag) = = S (a; — any)? > 0. 
i=1 i=l 2 i=l 
| 


Consider, now, a random walk of length 2é starting at u. By summing over all the 
possibilities of the vertex the walk reaches after k steps we conclude that for every 
vertex v, 


P*¥ (u,v) = S- P¥(u,w)P*(w,v) = S- P¥(u, w)P*(v, w) , (1) 
weVv weVv 


where the last equality follows from the fact that G' is an undirected regular graph. 
Since G is vertex-transitive, the two vectors (P*(u, w))wev and (P*(v, w))wev 
can be obtained from each other by permuting the coordinates. Therefore, by the 
claim above, the maximum possible value of the sum in the right-hand side of (1) is 
when u = v, completing the proof of the theorem. | 
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Random Graphs 


It is six in the morning. The house is asleep. Nice music is playing. I prove and 
conjecture. 


— Paul Erdos, in a letter to Vera Sds 


Let n be a positive integer, 0 < p < 1. The random graph G(n, p) is a probability 
space over the set of graphs on the vertex set {1,...,n} determined by 


Pr [{i, j} EG] =p 


with these events mutually independent. This model is often used in the probabilistic 
method for proving the existence of certain graphs. In this chapter we study the 
properties of G(n, p) for their own sake. 

Random graphs is an active area of research that combines probability theory 
and graph theory. The subject began in 1960 with the monumental paper On the 
Evolution of Random Graphs by Paul Erdos and Alfred Rényi. The book Random 
Graphs by Bollobds (2001) is the standard source for the field. Another book, also 
entitled Random Graphs by Janson, Luczak and Rucinski (2000) is also excellent. In 
this chapter we explore only a few of the many topics in this fascinating area. 

There is a compelling dynamic model for random graphs. For all pairs 7,7 let 
x;,; be selected uniformly from [0, 1], the choices mutually independent. Imagine 
p going from 0 to 1. Originally, all potential edges are “off.” The edge from 7 to 7 
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(which we may imagine as a neon light) is turned on when p reaches x;,; and then 
stays on. At p = 1 all edges are “on.” At time p the graph of all “on” edges has 
distribution G(n, p). As p increases G(n, p) evolves from empty to full. 

In their original paper, Erdds and Rényi let G(n, e) be the random graph with n 
vertices and precisely e edges. Again there is a dynamic model: Begin with no edges 
and add edges randomly one by one until the graph becomes full. Generally G(n, e) 
will have very similar properties as G(n, p) with p ~ e/(5). We will work on the 
probability model exclusively. 


10.1 SUBGRAPHS 


The term “the random graph” is, strictly speaking, a misnomer. G(n, p) is a prob- 
ability space over graphs. Given any graph theoretic property A there will be a 
probability that G(n, p) satisfies A, which we write Pr|G(n,p) = A]. When A is 
monotone Pr [G'(n, p) = A] is a monotone function of p. As an instructive example, 
let A be the event “G is triangle free.” Let X be the number of triangles contained in 
G(n, p). Linearity of expectation gives 


It turns out that the distribution of X is asymptotically Poisson. In particular, 


lim Pr[G(n,p) — A] = lim Pr[X =0] = oH /6 | 


n—+00 
Note that 
ee ae 
lim e~°/6 = 0. 
c—+00 


When p = 10~°/n, G(n, p) is very unlikely to have triangles and when p = 10°/n, 
G(n,p) is very likely to have triangles. In the dynamic view the first triangles 
almost always appear at p — @(1/n). If we take a function such as p(n) = n~°-9 
with p(n) >> n—! then G(n,p) will almost always have triangles. Occasionally 
we will abuse notation and say, for example, that G(n,n~°-°) contains a triangle 
— this meaning that the probability that it contains a triangle approaches 1 as n 
approaches infinity. Similarly, when p(n) < n~!, for example, p(n) = 1/(nInn), 
then G'(n, p) will almost always not contain a triangle and we abuse notation and say 
that G(n, 1/(n In n)) is triangle-free. It was a central observation of Erdés and Rényi 
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that many natural graph theoretic properties become true in a very narrow range of 
p. They made the following key definition. 


Definition 4 r(n) is called a threshold function for a graph theoretic property A if 
1. When p(n) < r(n), limn—oo Pr[G(n, p) — A] = 0, 
2. When p(n) > r(n), limpoo Pr[G(n,p) A] = 1 

or vice versa. 


In our example, 1 /n is a threshold function for A. Note that the threshold function, 
when one exists, is not unique. We could equally have said that 10/n is a threshold 
function for A. 

Let’s approach the problem of G(n, c/n) being triangle-free once more. For every 
set S of three vertices let Bg be the event that S' is a triangle. Then Pr[Bs] = p°. 
Then “triangle-freeness” is precisely the conjunction A Bs over all S. If the Bs 
were mutually independent then we would have 


pe(ABE] = [10] =v vee 


The reality is that the Bs are not mutually independent though when |SMT! < 1, 
Bg and Br are mutually independent. 

We apply Janson’s Inequality, Theorem 8.1.1. In the notation of Section 8.1, 
I= {S c V(G) : |S| = 3} and S ~ T if and only if |S T| = 2. Here 
e=p® =0(1), u = (3)p® ~ c3/6 and M = eH F001) e~¢ /6+0(1) There are 
6(4) = O(n*) pairs S,T of triples with S ~ T. For each Pr [Bs A Br] = p®. Thus 


A = O(n*)p® = n71¥°() = o(1). 
When A = o(1) Janson’s Inequality sandwiches an asymptotic bound: 
lim Pr [A Bs| = lim M=e-°/6, 
n~—+Oo noo 


Can we duplicate this success with the property A that G contains no (not neces- 
sarily induced) copy of a general given graph H? We use the definitions of balanced 
and strictly balanced of Section 4.4. 


Theorem 10.1.1 Let H be a strictly balanced graph with v vertices, e edges and a 
automorphisms. Let c > 0 be arbitrary. Let A be the property that G contains no 
copy of H. Then with p = en~?/€, 


Jim Pr [G(n, p) — A] = exp[—c*/a]. 


Proof. Let Ag, 1 <a< (P)u! /a, range over the edge sets of possible copies of H 
and let B,, be the event G(n, p) D Ag. We apply Janson’s Inequality. As 


lim p= Jim (")otp/a aaa aa 51 


n—O0oO 
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we find 
lim M = exp[—c*/a]. 


noo 


Now we examine (as in Theorem 4.4.2) 


A= 5° Pr[Ba A Be] . 
anf 


We split the sum according to the number of vertices in the intersection of copies a 
and 3. Suppose they intersect in j vertices. If 7 = 0 or j = 1 then Ag N Ag = 0 so 
that a ~ @ cannot occur. For 2 < j < v let f; be the maximal |A, 9 Ag|, where 
a~ @ and a, Z intersect in j vertices. As a # §, fy <e. When 2 < 7 <u — 1 the 
critical observation is that Ag Ag is a subgraph of H and hence, as H is strictly 
balanced, 
aS 

There are O(n?"~/) choices of a, intersecting in j points, since a, 3 are determined, 
except for order, by 2v — j points. For each such a, (, 


j € 
ig 


Pr [Ba A Bg] = piaavAal — pre lAaNAal < presi 
Thus ‘ 
A= we (n2°-4) O (n-Co/eh2e- 43)) 
j=2 


But 


.  v uf; 
2u-j-—-(2 )=-j<0 
Uv J € 7) e J < 3 
so each term is o(1) and hence A = o(1). By Janson’s Inequality, 


lim Pr [A Ba] = lim M = exp[-c’/al, 


NCO 


completing the proof. a 


10.2 CLIQUE NUMBER 


In this section we fix p = 1/2 (other values yield similar results) and consider the 
clique number w(G(n, p)). For a fixed c > 0 let n, k — oo so that 


()2@ sie, 


As a first approximation, 
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and 
2inn 


In2 © 
Here p — cso M — e~°. The A term was examined in Section 4.5. For this k, 
A = 0o(E[X]’) and so A = o(1). Therefore 


kw 


ditt, Pr le(G(n,p)) < kl = expl=d. 


Being more careful, let n9(k) be the minimum n for which 


Observe that for this 7 the left-hand side is 1 + o(1). Note that (7) grows, in n, like 
n*, For any \ € (—oo, +00) if 


n = no(k) E + re 
then 
(p)2-@ = [142429]! <0 + 01, 
and so 


Pr [w(G(n,p)) < k] = ae o(1). 


As ranges from —oo to +00, ee ranges from 1 to 0. As no(k + 1) ~ V2no(k) 
the ranges will not “overlap” for different k. More precisely, let K be arbitrarily 


large and set 
i not c 2 | es f + =| 


For k > ko(K), Ip-1 OI, = 0. Suppose n > no(ko(K)). If n lies between the 
intervals (which occurs for “most” 7), which we denote by J, < n < Ig41, then 


Pr [w(G(n,p)) < k] <e*” + 0(1), 
nearly zero, and 
Pr[w(G(n,p)) <k+1) >e-® * +0(1), 


nearly one, so that 


K 


= ever a5 o(1) , 


nearly one. When n € J; we still have J) <n < Ip 1 so that 


Pr [w(G(n, p)) = k] > e7* 


Pr [w(G(n,p)) =kork—1]>e  —e-®" +0(1), 


nearly one. As K may be made arbitrarily large this yields the celebrated two- 
point concentration theorem on clique number, Corollary 4.5.2 in Section 4.5. Note, 
however, that for most n the concentration of w (G(n, 3)) is actually on a single 
value! 
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10.3 CHROMATIC NUMBER 


In this section we fix p = 1/2 (there are similar results for other p) and let G be the 
random graph G(n, 5). We shall find bounds on the chromatic number x(G). A 
different derivation of the main result of this section is presented in Section 7.3. Set 


= (3) 
Let ko = ko (n) be that value for which 


f(ko —1) > 1 > f(ko). 


Then n = gf ree) so fork ~ ko, 
f(k+1)/f(k) = amt + 0(1)) =n) , 
Set 
k=k(n) = ko(n) -—4 
so that 


f(k) = ndteQ) ; 


Now we estimate Pr [w(G) < k] by the Extended Janson Inequality (Theorem 8.1.2). 
Here ys = f(k). (Note that Janson’s Inequality gives a lower bound of 2-f(*) = 
2-°*° 16 this probability but this is way off the mark since with probability 2~ (2) 
the random G is empty!) The value A was examined in Section 4.5 where 


k—1 


Xoo 
Bt t=2 


3 = g(t) . 


There g(2) ~ k*/n? and g(k — 1) ~ 2kn2~*/p were the dominating terms. In our 
instance p > n3+°) and 2-* = n~2+°) go g(2) dominates and 


pe k4 
n?2 


Aw 


Hence we bound the cligue number probability 


Pr [w(G) < k] < eT P24 =e OR" /(inny) | 


as k = O(Inn). [The possibility that G is empty gives a lower bound so that we 
may say the probability is ere though a 0(1) in the hyperexponent leaves lots 


of room.] 


Theorem 10.3.1 [Bollobas (1988)] Almost always 


x(G) 


n 


~ Qlogyn’ 
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Proof. Let a(G) = w(G) denote, as usual, the independence number of G. The 
complement of G has the same distribution G(n, $). Hence a(G) < (2+0(1)) logy n 
almost always. Thus 


almost always. 

The reverse inequality was an open question for a full quarter century! Set 
m = |n/ In? n|. For any set S of m vertices the restriction G|g has the distribution 
of G(m, 5). Let k = k(m) = ko(m) — 4 as above. Note 


k ~ 2loggm~ 2logyn. 


Then 
_m2to()) 


Pr [a[Gls] < k] <e 


mite) 


There are (") < 2" =2 such sets S. Hence 


1+0(1) 240(1) 
e€ 


Pr [a[G|s] < k for some m-set S] < 2” oi = o(1). 
That is, almost always every m vertices contain a k-element independent set. 

Now suppose G has this property. We pull out k-element independent sets and 
give each a distinct color until there are less than m vertices left. Then we give each 


point a distinct color. By this procedure 


Wee | ee 


k —k 
n n 
Se ee 1 es 
Sloan sk n+0( a5) 
n 
So (ead 
and this occurs for almost all G. | 


10.4 ZERO-ONE LAWS 


In this section we restrict our attention to graph theoretic properties expressible in 
the first-order theory of graphs. The language of this theory consists of variables 
(x,y, Z,...), Which always represent vertices of a graph, equality and adjacency 
(x = y,x ~ y), the usual Boolean connectives (A, 7, . . .) and universal and existential 
quantification (Vz, 4,). Sentences must be finite. As examples, one can express the 
property of containing a triangle 


Aeyde|en PALO ZA ye), 
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having no isolated point 


Ved leg 5 
and having radius at most two 
AVylr(y =z) ArAlyr~ rt)  Afze~yAywnal]. 


For any property A and any n, p we consider the probability that the random graph 
G(n, p) satisfies A, denoted 


Pr [G(n, p) - A] . 


Our objects in this section will be the theorem of Glebskii, Kogan, Liagonkii and 
Talanov (1969) and independently Fagin (1976) (Theorem 10.4.1), and that of Shelah 
and Spencer (1988) (Theorem 10.4.2). 


Theorem 10.4.1 For any fixed p, 0 < p < 1 and any first-order A, 


lim Pr[G(n,p) F A] =Oorl. 


RM? CO 


Theorem 10.4.2 For any irrational a, 0 < a < 1, setting p = p(n) = n~° and for 
any first-order A, 

Jim Pr [G(n,p) — A] =Oorl. 
Both proofs are only outlined. 

We shall say that a function p = p(n) satisfies the Zero-One Law if the above 
equality holds for every first-order A. 

The Glebskii-Fagin Theorem has a natural interpretation when p = 0.5 as then 
G(n,p) gives equal weight to every (labeled) graph. It then says that any first-order 
property A holds for either almost all graphs or for almost no graphs. The Shelah— 
Spencer Theorem may be interpreted in terms of threshold functions. The general 
results of Section 10.1 give, as one example, that p = n~?/? is a threshold function 
for containment of a Ky. That is, when p < n~?/3, G(n,p) almost surely does 
not contain a A, whereas when p > n~?/° it almost surely does contain a K4. In 
between, say, at p = n~2/°, the probability is between 0 and 1, in this case 1—e7 1/24. 
The (admittedly rough) notion is that at a threshold function the Zero-One Law will 
not hold and so to say that p(n) satisfies the Zero-One Law is to say that p() is not a 
threshold function — that it is a boring place in the evolution of the random graph, at 
least through the spectacles of the first-order language. In stark terms: What happens 
in the evolution of G(n, p) at p = n~7/7? The answer: Nothing! 

Our approach to Zero-One Laws will be through a variant of the Ehrenfeucht 
Game, which we now define. Let G, H be two vertex disjoint graphs and ¢ a positive 
integer. We define a perfect information game, denoted EHR{G, H,t], with two 
players, denoted Spoiler and Duplicator. The game has ¢ rounds. Each round has 
two parts. First the Spoiler selects either a vertex x € V(G) or a vertex y € V(H). 
He chooses which graph to select the vertex from. Then the Duplicator must select a 
vertex in the other graph. At the end of the ¢ rounds t vertices have been selected from 
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each graph. Let 21,..., 24 be the vertices selected from V(G) and yj, ..., y, be the 
vertices selected from V(H), where 2;, y; are the vertices selected in the ith round. 
Then Duplicator wins if and only if the induced graphs on the selected vertices are 
order-isomorphic; that is, if for alll <i<j <t, 


{x4, x5} E E(G) = {yi 5} € E(A) - 


As there are no hidden moves and no draws one of the players must have a winning 
strategy and we will say that that player wins EHR[G, H, t]. 


Lemma 10.4.3 For every first-order A there is a t = t(A) so that if G, H are any 
graphs with G |= A and H |= A then Spoiler wins EHR|G, H, t}. 


A detailed proof would require a formal analysis of the first-order language so 
we give only an example. Let A be the property V,3,[2 ~ y] of not containing an 
isolated point and set t = 2. Spoiler begins by selecting an isolated point y; € V(#), 
which he can do as H | 7A. Duplicator must pick 1; € V(G). As G — A, 2 
is not isolated so Spoiler may pick 72 € V(G) with x, ~ x2 and now Duplicator 
cannot pick a “duplicating” yo. 


Theorem 10.4.4 A function p = p(n) satisfies the Zero-One Law if and only if for 
every t, letting G(n, p(n)), H(m, p(m)) be independently chosen random graphs on 
disjoint vertex sets, 

lim Pr [Duplicator wins EHR{G(n, p(n)), H(m, p(m)), tj] = 1. 


m,n oo 


Remark. For any given choice of G, H somebody must win EHR{G, H, t]. (That is, 
there is no random play, the play is perfect.) Given this probability distribution over 
(G, H) there will be a probability that EHR[G, H,¢] will be a win for Duplicator, 
and this must approach 1. 


Proof. We prove only the “if” part. Suppose p = p(n) did not satisfy the Zero-One 
Law. Let A satisfy 
lim Pr[G(n, p(n)) — A] =c, 
n-0Co 


with 0 <c <1. Let ¢ = ¢(A) be as given by the lemma. With limiting probability 
2c(1 — c) > Oexactly 1 of G(n, p(n)), H(n, p(n)) would satisfy A and thus Spoiler 
would win, contradicting the assumption. This is not a full proof since when the 
Zero-One Law is not satisfied lim,,_... Pr [G(n, p(n)) | A] might not exist. If there 
is a subsequence 7; on which the limit is c € (0,1) we may use the same argument. 
Otherwise there will be two subsequences n;,™m, on which the limit is zero and one, 
respectively. Then letting n,m — oo through n,,m, respectively, Spoiler will win 
EHR|G, H, t] with probability approaching 1. i 


Theorem 10.4.4 provides a bridge from logic to random graphs. To prove that 
p = p(n) satisfies the Zero-One Law we now no longer need to know anything about 
logic — we just have to find a good strategy for the Duplicator. 
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We say that a graph G has the full level s extension property if for every distinct 
Ui,---;Ua;U1;--.,U € G witha+b < s there is an xz € V(G) with {z,u;} € 
E(G),1 <i < aand {z,v;} ¢ E(G), 1 < 7 < b. Suppose that G, H both have 
the full level s — 1 extension property. Then Duplicator wins EHR|G, H, s] by the 
following simple strategy. On the ith round, with 71,...,2;-1,Y1,---, yi—1 already 
selected, and Spoiler picking, say, x;, Duplicator simply picks y; having the same 
adjacencies to the y;, 7 < i as x; has to the z;,7 <7. The full extension property 
says that such a y; will surely exist. 


Theorem 10.4.5 For any fixed p, 0 < p < 1, and any s, G(n,p) almost always has 
the full level s extension property. 


Proof. For every distinct u),...,Ua,U1,---,Ub,£ € G witha +b < s we define 
Eu, ,...1a,v1,...,vp.c tO be the event that {x,u;} € E(G), 1 <i < aand {z,v;} ¢ 
E(G),1 <j <0. Then 


Pr [Frets jesse D1 ys 2] = p°(1 — p)? : 


Now define 


Bots esta Yipes) = \ Flay oestta V1 pees Vb 


x 


the conjunction over x # U1,...,Ua,V1,---, Up. These events are mutually indepen- 
dent over x since they involve different edges. Thus 


Pr 


emer Slag =p st 
x 


Set « = min{p, 1 — p}* so that 


Pr 


\ Farge moce < (1 > ee : 
x 
The key here is that € is a fixed (dependent on p, s) positive number. Set 


B= V | eT a) 


the disjunction over all distinct uz,...,Ua,U1,...,¥» € G witha + b < s. There 
are less than s?n° = O(n°) such choices as we can choose a, b and then the vertices. 
Thus 

Pr [E] < s?n*(1—6)""°. 


But 


lim s?n°(1 —«)""* =0 
nm—-Oo 


and so F holds almost never. Thus 4F, which is precisely the statement that G(n, p) 
has the full level s extension property, holds almost always. 


ZERO-ONE LAWS 171 


But now we have proved Theorem 10.4.1. For any p € (0,1) and any fixed s as 
m,n — oo with probability approaching one both G(n, p) and H(m, p) will have the 
full level s extension property and so Duplicator will win EHR[G(n, p), H(m, p), s]. 

Why can’t Duplicator use this strategy when p = n~~? We illustrate the difficulty 
with a simple example. Let 0.5 < @ < 1 and let Spoiler and Duplicator play a three 
move game on G,H. Spoiler thinks of a point z € G but doesn’t tell Duplicator 
about it. Instead he picks x1, 22 € G, both adjacent to z. Duplicator simply picks 
yi, yo © H, either adjacent or not adjacent dependent on whether 7) ~ x2. But now 
wily Spoiler picks x3 = z. H ~ H(m,m7~“%) does not have the full level 2 extension 
property. In particular, most pairs y;,y2 do not have a common neighbor. Unless 
Duplicator was lucky, or shrewd, he then cannot find yz ~ yi, y2 and so he loses. 
This example does not say that Duplicator will lose with perfect play — indeed, we 
will show that he almost always wins with perfect play — it only indicates that the 
strategy used need be more complex. 

We begin our proof of the Zero-One Law, Theorem 10.4.2. Let a € (0,1), a 
irrational, be fixed. A rooted graph is a pair (R, H) where H is a graph on vertex 
set, say, V(H) = {X,,...,X+,V%1,...,Y,} and R = {Xj,...,X,} is a specified 
subset of V(H), called the roots. For example, (R, H) might consist of one vertex 
Y;, adjacent to the two roots X,, X2. Let v = u(R, H) denote the number of vertices 
that are not roots and let e = e(R, H) denote the number of edges, excluding those 
edges between two roots. We say (R,H) is dense if v — ea < 0 and sparse if 
v — ea > 0. The irrationality of a assures us that all (R, H) are in one of these 
categories. We call (R, H) rigid if for all S with R C S c V(A), (S, H) is dense. 
We call (R, #) safe if for all S with R C S C V(A), (R, H|s) is sparse. Several 
elementary properties of these concepts are given as Exercise 4. We sometimes write 
(R, S) for (R, H|s) when the graph H is understood. 

We think of rooted graphs as on abstract points. In a graph G we say that vertices 
Y1,--+)Yy form an (R, H) extension of z1,...,2, if whenever X; is adjacent to Y; 
in H, x, is adjacent to y; in G and also whenever Y; and Y; are adjacent in H, y; and 
yj are adjacent in G. Note that we allow G to have more edges than H and that the 
edges between the roots “don’t count.” 


Lemma 10.4.6 [Generic Extension] Let (R, H), as given above, be safe. Lett > 0 
be an arbitrary, but fixed, integer. Then in G ~ G(n,n~°%) almost surely for all 
21,...,2, there exist y1,.-.,Yv such that 


(i) Y1,--+5Yy form an (R, H) extension of 11,..., Xr. 


(ii) x;, yj; are adjacent in G if and only if X;,Y; are adjacent in H and y;,y; are 
adjacent in G if and only if Y;, Y; are adjacent in H. 


(iii) (For t > 0) If z1,...,2y with u < t form a rigid (R’, H’) extension over 
L1,-++, Lp, Yiy+++5 Yu then there are no adjacencies between any pair Zz, y;- 


2, and let (R, H) have root X), nonroot Y; and 


Example. Let a € ( = 
,H!' ) consisting of two roots X,, X2 with a common 


1 
311), t 
edge {X, Yi}. Note that t (R’ 
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neighbor Y; has v = 1,e = 2 and is rigid. Generic Extension in this instance says 
that every x; has a neighbor y; such that x1, y; do not have a common neighbor 2}. 


Proof. From Exercise 5 almost surely every 71,...,2, has O(n”p*) (R, H) exten- 
sions y1,-..,Yy,- Our rough notion will be that the number of these y1,..., yy that 
fail to be generic, in any of the bounded number of ways that could occur, would be 
bounded by a smaller power of n. 

Call y special if y € cli4y(a1,-...,2,) (as defined below), otherwise nonspecial. 
Let K, from the Finite Closure Lemma 10.4.7 below, be an almost sure bound on the 
number of special y, uniform over all choices of the x’s. Extend (R, H) to (Rt, Ht) 
by adding K new roots and no new edges. This is still safe and of the same type 
as (R, H) so again by Exercise 5 almost surely every 24,...,@,,21,-.-,2K has 
O(n’p*) (Rt, H*) extensions y1,...,Yy. Letting the z’s include all the special 
vertices we have that almost surely every 11,...,2, has O(n” p°) (R, H) extensions 
Y1s--+,Yv With all y; nonspecial. Now we bound from above the number of those 
nonspecial (F, H) extensions that fail condition (ii) or (iii). 

Consider those extensions (R, H’) with an additional edge y;, y; or x;, y;. This 
cannot contain a rigid subextension as that would make some y; special. Hence by 
Exercise 4 it must be a safe extension. Applying Exercise 5 there are O(n”p®t!) = 
o(n”p®) such extensions. 

Consider extensions by y),...,Yy and z1,..., Z, aS in condition (iii) with some 
23, Yk adjacent. We can further assume the z’s form a minimal rigid extension over 
the x’s and y’s. Let the z’s have type (v1, e1) as an extension over the x’s and y’s 
so that v1 — e;@ is negative. If the y’s and z’s together formed a safe extension 
over the x’s there would be O(n” **! p*t®1) = o(n”p®) such extensions and hence 
at most that many choices for the y’s. Otherwise, by Exercise 4, there would be a 
rigid subextension. It could not overlap the nonspecial y’s. From the minimality it 
must be precisely all of the z’s. Given the x’s from the Finite Closure Lemma 10.4.7 
there are O(1) choices for the z’s. Then the y’s form a (v, e’) extension over the «’s 
and y’s with e’ > e. This extension has no rigid subextensions (again as the y’s are 
nonspecial) and hence is safe. Again applying Exercise 5 there are O(n" p® ) such 
y’s for each choice of the z’s and so O(n®p® ) = o(n®p*) total choices of such y’s. 

In all cases the number of y’s that fail conditions (ii) or (iii) is o(n’p°). Hence 
there exist y’s, indeed most choices of nonspecial y’s, that are (R, H) extensions and 
satisfy conditions (ii) and (iii). | 


A rigid ¢-chain in G is a sequence X = Xp C X; C --- C Xx with all 
(X;-1, X;) rigid and all |X;4, — X;| < t. The t-closure of X, denoted by cl,(X), 
is the maximal Y for which there exists a rigid t-chain (of arbitrary length) X = 
Xo CX, C--- C X~K = Y. When there are no such rigid t-chains we define 
cl;(X) = X. To see this is well defined we note (using Exercise 4) that if X = 
Xp XC: CXR =ZandX = Xp CY,C-:- CY, =Y are rigid t-chains 
thensois X = Xp CX, C+} C XK CZUYC::: CZUY, = ZUY. 
Alternatively, the ¢-closure cl;(X) is the minimal set containing X that has no rigid 
extensions of < ¢ vertices. We say %1,...,0%, € G, y1,...,yr € H have the same 
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t-type if their t-closures are isomorphic as graphs, the isomorphism sending each x; 
to the corresponding y;. 

The t-closure is a critical definition, describing the possible special properties of 
the roots. Suppose, for example, a € (4, 1) and consider cl; (x1, 22). The only rigid 
extension with ¢ = 1 in this range is a nonroot adjacent to two (or more) roots. A 
sample 1-type would be: x1, x2 have common neighbors y1, yo and then x1, y; have 
common neighbor y3 and there are no further edges among these vertices and no 
pairs have common neighbors other than those described. A randomly chosen x1, x2 
would have type: 71,72 have no common neighbors and are not adjacent. 

We can already describe the nature of Duplicator’s strategy. At the end of the 
rth move, with z,..., 2, and yi,..., yy having been selected from the two graphs, 
Duplicator will assure that these sets have the same a,-type. We shall call this the 
(a1,...,@4) lookahead strategy. Here a, must depend only on ¢, the total number 
of moves in the game and a. We shall set a, = 0 so that at the end of the game, 
if Duplicator can stick to the (a,,...,@¢) lookahead strategy then he has won. 
If, however, Spoiler picks, say, Z-41 So that there is no corresponding y,+, with 
Ly,.-+,Ep41 and y),..., Yr41 having the same a,41-type, then the strategy fails and 
we say that Spoiler wins. The values a, give the “lookahead” that Duplicator uses 
but before defining them we need some preliminary results. 


Lemma 10.4.7 [Finite Closure] Let a, r > 0 be fixed. Set € equal to the minimal 
value of (ea — v)/v over all integers v,e with 1 <v <tandea—v > 0. Let K be 
such that r — Ke < 0. Then in G(n,n~%) almost surely, 


Ic (X)|< K+r 
forall X C G with |X| =r. 


Proof. If not there would be a rigid t-chain X = Xo CX; C---C X, = Y with 
K+r<|¥|<K4+r+t. Letting (X;_1, X;) have type (v;, e;) the restriction of 
G to Y would have r + }> vu; vertices and at least ~ e; edges. But 


(r+ }ou) -a (does) =r + om — ae) sre au <r— Ke <0 


and G almost surely has no such subgraph. a 


Remark. The bound on |cl;(X)| given by this proof depends strongly on how close 
a@ may be approximated by rationals of denominator at most t. This is often the case. 


If, for example, 

iT 1 1 1 

i eae a 
then almost surely there will be two points 71,22 € G(n,n~°%) having s common 
neighbors so that |cl;(21,@2)| > s+ 2. 


Now we define the a,,..., a; of the lookahead strategy by reverse induction. We 
set a, = 0. If at the end of the game Duplicator can assure that the 0-types of 
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X1,...,%, and yi,..., yz are the same then they have the same induced subgraphs 
and he has won. Suppose, inductively, that b = a,+41 has been defined. We define 
a = a, to be any integer satisfying 


l.a>b. 
2. Almost surely |cl,(W)| — r < a for all sets W of size r + 1. 


Now we need to show that almost surely this strategy works. LetG, ~ G(n,n7~°%), 
G2 ~ G(m,m~*) and Duplicator tries to play the (a,,...,a;) lookahead strategy 
on EHR(G}, Go, t). 

Consider the (r + 1)st move. We have b = a,41, @ = a, as above. Points 
U1,-.-,U-r © Gy, Yi,..-,Yr © Ge have already been selected. Set, for notational 
convenience, X = {x ;,...,v,} and Y = {y1,...,y,-}. We assume Duplicator has 
survived thus far so that cl,(X) & cl.(Y), the isomorphism sending each x; to the 
corresponding y;. Spoiler picks, say, x = ry, € Gy. Set Xt = X U {x} and 
Y+ = Y U {y}, where y is Duplicator’s as yet undetermined countermove. We 
distinguish two cases. 

We say Spoiler has moved inside if x € cla(X). Then as b < a, ch,(Xt) C 
cl,(X). Duplicator looks at the isomorphism Y : cl,(X) — cla(Y) and selects 
y = Ve). 

We say Spoiler has moved outside if x ¢ cla(X). Let NEW be those vertices 
of cly(X +) that do not lie in cl,(X). NEW #4 Masa e NEW. |NEW| < aas 
NEW C cl,(X*) — X. Consider NEW as an (R, H) extension of cla(X). This 
extension must be safe as otherwise it would have a rigid subextension NEW but 
that subextension would then be in cl,(X ). Duplicator now goes to G2 and, applying 
the Generic Extension Lemma 10.4.6 with ¢ = b, finds an (R, H) extension of cl, (Y). 
That is, he finds an edge preserving injection V : cl,(X) U NEW — H extending 
the isomorphism between cl,(X) and cl,(Y). Duplicator selects y = U(x). 

Why does this work? Set NEW’ = U(NEW) and CORE = V(cly(X7*)). We 
can reach cl,(X 7+) by a rigid b-chain from X7* and the isomorphism gives the same 
chain from Yt toCORE so that cl,(Y *) contains CORE. Butcan it have additional 
vertices? We use the genericity to say no. Suppose there was a rigid extension 
MORE over CORE with at most b nonroots. We can’t have MORE entirely 
inside U[cl,(X) U NEW] as then U~'{[MORE] would be in cl,(XT) as well. Let 
MORE* be the vertices of MORE lying outside V[cla(X )UNEW]. MORE* is 
then a rigid extension of U{cl,(X )UN EW). By the genericity MORE would have 
no adjacencies to N EW’ and so would be a rigid extension of U[cla(X)] = cla(Y). 
As a > b the a-closure of a set cannot have rigid extensions with < 6 vertices. Hence 
there isno MORE. 

The first move follows the same pattern but is somewhat simpler. Set b = a, and 
let a satisfy a > b anda > |cl,(x)| for any x. Spoiler plays x € G,. (Effectively, 
there is no inside move as X = Q is the set of previous moves and cl,(0) = 9.) 
Duplicator calculates the graph H = cly(x) that has, say, v vertices (including x) and 
e edges. Since H is a subgraph of G; the threshold function for the appearance of H 
must come before n~*. In particular, for every subgraph H’ of H with v’ vertices 


EXERCISES 175 


and e’ edges we cannot have 7’ — ae’ < Q and therefore must have v’ — ae’ > 0. The 
conditions of Theorem 4.4.5 then apply and Gz almost surely has O(m*~’%) copies 
of H. Consider any graph H* consisting of H together with a rigid extension of H 
with at most b vertices. Such H* would have v + v* vertices and e + e* edges with 
vt —aet <0. The expected number of copies of H* is then O(m* e+" —2e")), 
which is o(m*—°). Hence there will be in G2 a copy of H that is not part of any 
such H*. (Effectively, this is generic extension over the empty set.) Duplicator finds 
the edge preserving injection WU : cl,(2) — G2 giving such a copy of H and selects 
y = (2). 

We have shown that the (a1,...,a;) lookahead strategy almost surely results in 
a win for Duplicator. By Theorem 10.4.4 this implies the Zero-One Law, Theo- 
rem 10.4.2. 


10.5 EXERCISES 


1. Show that there is a graph on 7 vertices with minimum degree at least n/2 in 
which the size of every dominating set is at least Q(log 7). 


2. Finda threshold function for the property: G(n, p) contains a copy of the graph 
consisting of a complete graph on four vertices plus an extra vertex joined to 
one of its vertices. 


3. Let X be the number of cycles in the random graph G(n, p) with p = c/n. 
Give an exact formula for E[X]. Find the asymptotics of E [|X] when c < 1. 
Find the asymptotics of EX] when c = 1. 


4. Here we write (R,S) for (R, H|s), where H is some fixed graph. 


e Let RC S CT. Show that if (R, S$), (S,7) are both dense then so is 
(R,T). Show that if (R, 5), (S,T) are both sparse then so is (R, T). 

e Let RC S. Show that if (2, S) is rigid then (X U R, X US) is rigid for 
any X. 


e R CU with (R,U) not sparse. Show there isa T with R c T C U with 
(R,T) dense. Show further there is an S with R C S C T with (2, S) 
rigid. 

e Show that any (R, T) is either rigid or sparse itself or there exists S with 
RCS CT such that (R, S) is rigid and (S, T) is sparse. 


5. We call (R, H) hinged if it is safe but there is no S with R C S C V(#) such 
that (S, H) is safe. For 71,...,2, € Glet N(x ,,...,2,) denote the number 
of (R, H) extensions. Set u = E[N] ~ n’pe. 


e Let (R, H) be hinged and fix x,,...,2, € G. Following the model of 
Section 8.5, especially Theorem 8.5.4, show that 


PrlpPN (eis. +. +, 27) =a > ee] = Of") 
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e Deduce that almost surely all N(x1,...,27) ~ py. 


e Show that N(z,...,2,) ~ wholds for any safe (R, H), by decomposing 
(R, H) into hinged extensions. 


THE PROBABILISTIC LENS: 
Counting Subgraphs 


A graph G = (V, E) on n vertices has 2” induced subgraphs but some will surely be 
isomorphic. How many different subgraphs can G have? Here we show that there 
are graphs G with 2”(1 — o(1)) different subgraphs. The argument we give is fairly 
coarse. It is typical of those situations where a probabilistic approach gives fairly 
quick answers to questions otherwise difficult to approach. 

Let G be a random graph on n vertices with edge probability 1/2. Let S C V, 
|S| = t be fixed. For any one to one p: S - V, p F id, let A, be the event that p 
gives a graph isomorphism — that is, for z,y € S, {x,y} © E = {pz, py} € EB. 
Set M, = {x € S: px # x}. We split the set of p by g = g(p) = |M,|. 

Consider the g(t — g) + 2) pairs x, y with x,y € S and at least one of x, yin M. 
For all but at most g/2 of these pairs {x,y} # {px, py}. (The exceptions are when 
px = y, py = x.) Let E, be the set of pairs {x, y} with {z, y} A {pz, py}. Define a 
graph H, with vertices EF, and vertex {x, y} adjacent to {px, py}. In H, each vertex 
has degree at most two ({x,y} may also be adjacent to {p~'z, p~!y}) and so it 
decomposes into isolated vertices, paths and circuits. On each such component there 
is an independent set of size at least one-third the number of elements, the extreme 
case being a triangle. Thus there is a set J, C FE, with 


Lb > |B, > E+) - 9/2 
pi 2 l\ELp| 2 3 


so that the pairs {x,y}, {px, py} with {x,y} € I, are all distinct. 

For each {x,y} € I, the event {x,y} € E = {pz, py} € E has probability 1/2. 
Moreover these events are mutually independent over {x,y} € I, since they involve 
distinct pairs. Thus we bound 


Pr[A,] < 27!el < gu7latt-9)+(3)-9/21/3 | 
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For a given g the function p is determined by {x : px # x} and the values px for 
those z so that there are less than n?2 such p. We bound 


i t 
S> PriApl => SY PrlApl < So n2ea- 9+ (2)-9/2073 | 
prea 9=1 g(p)=9 g=1 


We make the rough bound 


gto) + (3) -2=9(t-$-1) 20(5-1), 


since g < t. Then 


ye Pr [Ap] < s- (eereney: 


p#id g=1 


For, again being rough, t > 50Inn, 21/?-“/6 < n~ and Y) uig Pr [Ap] = o(1)- 
That is, almost surely there is no isomorphic copy of Gs. 

For all S ¢ V with |S! > 501nn let Is be the indicator random variable for there 
being no other subgraph isomorphic to Gls. Set X = }> Is. Then E [Js] = 1—o0(1) 
so, by linearity of expectation — there being 2"(1 — o0(1)) such S — 


E[X] =2"(1 —o(1)). 


Hence there is a specific G with X > 2”(1 ~ o(1)). 


jp 


The Erdos—Rényi Phase 
Transition 


Bach, Mozart, Schubert — they will never fail you. When you perform their 
work properly it will have the character of the inevitable, as in great mathematics, 
which seems always to be made of pre-existing truths. 


—E. L. Doctorow 


In their great work On the Evolution of Random Graphs, Erdés and Rényi (1960) 
expressed a special interest in the behavior of T’,, y(n), the random graph with n 
vertices and N(n) edges, when N(n) was near n/2: 


Thus the situation may be summarized as follows: the largest component of 
DP n,n(n) is of order logn for N(n)/n — c < 4, of order n?/3 for N(n)/n > 
c~ 3; and of order n for N(n)/n > c > 3. This double “jump” of the size 
of the largest component when N(n)/n passes the value 3 is one of the most 
striking facts concerning random graphs. 


Striking, indeed. The past half century has certainly confirmed the excitement that 
Erdos and Rényi expressed in their discovery. 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
Copyright © 2008 John Wiley & Sons, Inc. 
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11.1. AN OVERVIEW 


We favor the more modern viewpoint, examining the random graph G(n,p). The 
behavior of Erdos and Rényi’s I’, v(,) then corresponds to that of G(n,p) with 
p = N(n)/(5). We shall assume p = O(n~') throughout this chapter. 


We shall call 2 


p= 

n 
the coarse parametrization. The value 4 in the Erd6s—Rényi formulation corresponds 
to the value c = 1 in our parametrization. Values c < 1 and c > 1 give G(n, p) that 


are essentially different. We shall call 
1 
p=—+dAn—*/8 
n 


the fine parametrization. The importance of this parametrization is not @ priori at all 
obvious. Indeed, its “discovery” was one of the great advances in the field. In Sec- 
tion 11.7 we give a heuristic argument why this is the appropriate fine parametrization. 
Along with the fine parametrization we also define 


1 
e=An V8 so that p = iscg 


(11.1) 


We shall express various results in terms of either \ or € (or both), whichever best 
illustrates the result. We shall think of ¢, as functions of n. To avoid negative 
numbers we shall sometimes parametrize p = (1 — €)/n with e = A4n~'/3. This 
includes functions such as p = 1/n — 100n®-°'n-4/3. Of course, for n small this 
would give p < 0 and so would be nonsense. For n sufficiently large we will have 
p € [0,1]. As our results are always asymptotic we shall allow this slight abuse of 
notation and consider G(n, p) defined only for n appropriately large. 

In describing the nature of G(n, p) we shall refer to the complexity of compo- 
nents, as defined below. Observe that complexity zero and one correspond to tree 
components and unicyclic components, respectively. 


Definition 5 A connected component of a graph G with v vertices and e edges is 
said to have complexity e — v + 1. Components with complexity zero or one are 
called simple; components with complexity greater than one are called complex. 


Let C'(v) denote the component containing a given vertex v. Its size |C(v)| has a 
distribution. From the symmetry of G(n, p) the distributions of all |C'(v)| are the 
same. We shall be concerned with the sizes of the largest components. We shall 
let C; denote the ith largest component and L; denote its number of vertices. Thus 
L, = max, |C(v)|. We shall be particularly interested in L,, L2 and whether or not 
they are close together. 

The study of G(n, p) when p = O(n") splits into five regions. We describe them 
in order of increasing p, thus giving some sense of the evolution. 


Very Subcritical. Here we employ the coarse parametrization p = c/n and assume 
c is a constant with c < 1. 
Example: p = 1/2n. 
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e All components are simple. 
e 1, = O(lnn). 
e Ly ~ Ly for all fixed k. 
Barely Subcritical. Here we employ the fine parametrization: p = (1 — €)/n with 
é = \n—'/3, We assume ¢ = o(1) and A — ov. 
Example: p = 1/n — n-4/3n°-91, 
e All components are simple. 
e Ly = O(e77 Ind) = O(n?/2-7 Ind). 
e L; ~ Ly for all fixed k. 


The Critical Window. Here \ is a real constant. The value A = 0, perhaps surpris- 
ingly, has no special status. 
Example: p = 1/n + 2n~4/8, 


e The largest k components (k fixed) all have size L, = Q(n?/). 


e Parametrizing L, = c,.n?/° and letting d, denote the complexity of C;, 


there is a nontrivial joint distribution for c],...,¢x,d1,..., dx. 


Barely Supercritical. Here we employ the fine parametrization: p = (1+¢)/n with 
€ = \n 1/3, We assume € = o(1) and \ — oo. 
Example: p = 1/n + n74/3n9-01_ 
e Ly ~ 2en = 2An?/3, 
e The largest component has complexity approaching infinity. 
e All other components are simple. 
e Ly = O(e7? Ind) = O(n?/3A~7 In d). 
Note that the ratio £/L, goes to infinity. For this reason, in this regime we 
call the largest component the dominant component. 


Very Supercritical. We employ the coarse parametrization and assume c > 1. 
e Ly ~ yn, where y = y(c) is the positive real satisfying 
e Y=1-y. (11.2) 
e The largest component has complexity approaching infinity. 


e All other components are simple. 
e Ly = O(lnn). 


Following the terminology made famous by Erd6s and Rényi, we call the 
largest component the giant component. 


We shall give arguments for only some of the above statements, and then often 
in limited form. Other results are given in the exercises. Full arguments for these 
results, and much much more, can be found in the classic texts of Bollobas (2001) 
and of Janson et al. (2000). 
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11.2 THREE PROCESSES 


We place here in concise form three classes of probability spaces that we shall contrast 
and analyze. Our goal is to analyze the graph branching model. It is estimated by the 
binomial branching model, and thence by the Poisson branching model, which has a 
particularly nice analysis. 


e The Poisson Branching Model. 


Parameter: Nonnegative real c. 


Underlying Space: An infinite sequence Z,, t = 1,2,... of independent 
identically distributed random variables, each having Poisson distribution 
with mean c. 


Auxiliary Y;, t > 0: Given by initial value Yo = 1 and recursion 
%=N1+4-—1. 

Auxiliary T: T is that minimal t with Y, = 0. If no such ¢ exists we 
write T’ = oo. 

Nomenclature: Z; is the number of nodes born at time ¢, Y; is the queue 
size at time t, T is the total size. 


Interpretation: T is the total size of a Galton—Watson process, as described 
in Section 11.3, using a Poisson distribution with mean c. 


e The Binomial Branching Model 


Parameters: Positive integer m, real p € (0, 1]. 


Underlying Space: An infinite sequence Z,;, t = 1,2,... of independent 
identically distributed random variables, each having binomial distribu- 
tion B(m, p). 

Auxiliary Y;, ¢ > 0: Given by initial value Yo = 1 and recursion 
Y, = Yi-1 + Zi — 1. 


Auxiliary 7’: T is that minimal t with Y; = 0. If no such ¢ exists we 
write IT’ = oo. 

Nomenclature: Z; is the number of nodes born at time ¢, Y; is the queue 
size at time t, T is the total size. 


Interpretation: 7 is the total size of a Galton—Watson process, as described 
in Section 11.3 using a binomial distribution with parameters m, p. 


e The Graph Branching Model 


Parameters: Positive integer n, real p € [0, 1]. 


Underlying Space: A sequence Z),..., Z,. Z; has binomial distribution 
with parameters N;_1,p, with Nz_1 as given below. 


Auxiliary Y;, ¢ > 0: Given by initial value Yo = 1 and recursion 
Y%=YN-1+ 24 —1. 
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— Auxiliary N;, t > 0: Given by initial value Np = n — 1 and recursion 
N; = Ni-1 — Z. Equivalently, N; =n —t— Y;. 


— Auxiliary T: T is that minimal ¢ with Y, = 0 or, equivalently, N; = n—t. 
1<T < nalways. 


— Nomenclature: Z, is the number of nodes born at time t, Y; is the queue 
size at time t, N; is the number of neutral vertices at time t, 7 is total 
size. 


— Interpretation: T is the size of the component Cv) of a given vertex 
v in G(n, p), as found by the Breadth First Search process described in 
Section 11.5. 


We use the superscripts po (Poisson), bin (binomial) and gr (graph) to distinguish 
these three processes when necessary. 


11.3. THE GALTON—-WATSON BRANCHING PROCESS 


Let Z be a distribution over the nonnegative integers. The Galton—Watson process 
begins with a single root node, we can call her Eve. Eve has Z children. Each of her 
children (if there are any) now independently has Z children. The process continues, 
each new offspring having an independent number Z of children. Let 7’ be the total 
number of nodes (including Eve herself) created in the process. It is possible that the 
process goes on forever, in which case we write T’ = oo. 

Our analysis of the Galton—Watson process uses fictional continuation. Let Z;, 
t = 1,2,..., be acountable sequence of independent identically distributed variables, 
each having distribution Z. This defines our probability space. We think of the chil- 
dren being born in a Breadth First Search manner; that is, Eve has her children, which 
are ordered in some way. Now the children, in order, have children. Each child’s 
children are ordered in some way and this gives an ordering of Eve’s grandchildren. 
Now the grandchildren have children in order, and the process continues. We count 
Eve as node number 1, her children have node numbers 2,...,1 + Z, and, more 
generally, each node is given a distinct positive integer as its node number. We let 
Z; be the number of children of the ¢th node. Since the Z, are independent and have 
distribution Z this corresponds to the Galton—Watson process. Imagine the ¢th node 
having Z, children and then dying. By time t we mean the process after the tth node 
has had her children and died. Let Y; be the number of living children at time t. We 
set initial value Yo = 1, corresponding to the node Eve. We have the recursion 


Y, = Y:-1+ Z — 1 for allt > 1. 
There are two essentially different cases. 


e Y; > 0 forall t > 0. In this case the Galton—Watson process goes on forever 
and T = oo. 


184 THE ERDOS-RENY! PHASE TRANSITION 


e Y; = Oforsomet > 0. In this case let T be the /east integer for which Yr = 0. 
Then the Galton—Watson process stops with the death of the Tth node and T 
is the total number of nodes in the process. 


Our fictional continuation enables us to consider the Y; as an infinite random walk, 
with step size Z — 1. When c < 1 the walk has negative drift and so tends to —oo. 
When c > 1 the walk has positive drift and tends to +00. The process when c < 1 
is called subcritical and the process when c > 1 is called supercritical. When c = 1 
the walk has zero drift and the situation is especially delicate. 

The above is quite general. When Z is Poisson or binomial (the only cases of 
interest to us) this yields the Poisson branching process and the binomial branching 
process of Section 11.2. 


11.4 ANALYSIS OF THE POISSON BRANCHING PROCESS 


In this section we study J’ = T°. We often drop the value c and the superscript po 
for notational simplicity. 


Theorem 11.4.1 [fc < 1, T is finite with probability one. If c = 1, T is finite with 
probability one. If c > 1 then T is infinite with probability y = y(c), where y is that 
unique positive real satisfying the equation (11.2). 


Proof. Suppose c < 1. If T > t then Y; > 0 so that Z; +---+ Z, > t. Chernoff 
bounds give that Pr [Y; > 0] < e~** for a constant k. In particular, Pr [Y; > 0] > 0 
so that Pr ([T > ¢| — O and T is finite with probability one. 

Suppose c > 1. Set z = 1 — y = Pr[T' < co]. Given that Eve has ¢ children the 
probability that the branching process is finite is z* as all i branches must be finite. 
Thus 


[oe] 


= : tot 
z= SP: [Z, =iz'= yer = ec(2-1)_ 
2 i=0 


Setting y = 1 — z gives the equation (11.2). Forc = 1,e~¥ > 1—y for y > 0 so the 
solution must be y = 0. For c > 1 the function f(y) = 1 — y —e~% has f(0) = 1, 
f(1) < Oand f’(0) = c—1 > Oso there isay € (0,1) with f(y) = 0. Furthermore, 
as f is convex, there is precisely one y. We have shown that either Pr [T’ < co] = 1 
or Pr [T < co] = 1-— y > 0. The argument that Pr [T < oo] # 1 (not surprising as 
the walk has positive drift) is left for the exercises. a 


Theorem 11.4.2 For any positive real c and any integer k, setting T = TP°, 
e —ck ( Cc ae 1 
k} : 


We defer the proof of this classic result to Section 11.6 when we will give a proba- 
bilistic proof! 


Pr[T =k] = 
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When c = 1 Stirling’s formula gives 


e She 1 3/2 
Pr[T =k] = ~ kore | 11.3 
This perforce approaches zero but it does so only at polynomial speed. In general, 
1 
Pr([T. = k) ~ 2 k7-2/2e71(ce!—©)F 
[Ze = B] ~ Teak Mer Nce!—*) 


For any c # 1 (whether larger or smaller than one) ce'~° < 1 and therefore 
Pr [Z’. = k] approaches zero at exponential speed. This gives a bound on the tail 
distribution 
Prt >a) eter), (11.4) 
where @ =c—1-—Inc>0. 
We are particularly interested in the Poisson branching process when c is near 


one. Let us parametrize 
c=l1te. 


When € > 0, Pr [Ti+- = 00] is the y = y(e) € (0,1) satisfying f(y) = 1—y— 
e~(“+=)¥ — 0. Some fun calculus gives 


Pr [Tite = co] ~ 2ease > OT. (11.5) 


Suppose c — 1* so that e — 0*. We have 


62 


In(ce’~°) = In(l+e)-e~n aoe 
Thus 


1 
Pr ([Ti4e =u] ~ yee for u = o(e~”). 


Note that Pr [714- =u] ~ Pr[T, = u] in this range. When wu reaches order ¢~? 


there is achange. For u = Ae~? and fixed A, 


1 
Pr vere = Ae~?| my tn : 


When A —> 00 we absorb smaller factors into the exponential term: 
Pr [Tige Ac 7) Sete OMA? | 


When c is slightly less than one we can write c = 1 — c, where ¢ — 0*. We have 
In(ce~°) ~ —4e?, the same as for c = 1 + ¢. Indeed when u = o(e %), 


Priqj2. Sal Pris. = 


For A — ox, 
Pr its = Ae~?] =gte Grea: 
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The Poisson branching processes with means 1 + ¢ and 1 — ¢ look almost the same, 
with the (important!) distinction that the mean 1 + € process is sometimes infinite 
while the mean 1 — € process never is. 
In short, the Poisson branching process with mean 1 = €¢ acts as if it had mean 
1 until reaching size on the order ¢~?. Until then Pr [7\.- = u] is dropping at a 
polynomial rate. Upon reaching order ¢~?, Pr [T14- = u] drops exponentially in w. 
We are particularly interested in the tail distribution. For ¢ — 0+ and A — oo, 


Pelt Ae |e en Ai (11.6) 
The same holds for the finite part of T1+<: 
Pr [co > Tite > Ae~?] ae ge? ONO? e 
When A — oo this quantity is o(€) so (11.5) gives 


Pr Ti+. > Ae~?| ~ 2e when ce — 0+ and A > oo. (11.7) 


11.5 THE GRAPH BRANCHING MODEL 


Abbreviation: We use BFS as an abbreviation for Breadth First Search. BFS 
algorithms are a mainstay of computer science and central to our approach. 

Let C(v) denote the component, in G(n, p), containing a designated vertex v. We 
generate C(v) using the (standard) BFS algorithm to find C(v). We begin with root 
v. In this procedure all vertices will be live, dead, or neutral. The live vertices will 
be contained in a queue. Initially, at time zero, v is live, the queue consists of one 
vertex, v itself, and all other vertices are neutral. At each time ¢t we remove a live 
vertex w from the top of the queue (in computer science parlance we “pop the queue”’) 
and check all pairs {w, w’}, w’ neutral, for adjacency in G. The popped vertex w 
is now dead. Those neutral w’ (if any) adjacent to w are added to the bottom of the 
queue and are now live. (They can be placed in any particular order.) The procedure 
ends when the queue is empty. We let T denote that time. At time T' all vertices are 
neutral or dead and the set of dead vertices is precisely the component C(v). That 
is, T = |C(v)]. 

Let Z, denote the number of vertices added to the queue at time t. Let Y; denote 
the size of the queue at the conclusion of time t. We set Yo = 1, reflecting the initial 
size of the queue. At time ¢ we remove one vertex and add Z; vertices to the queue 
so we have the recursion Y; = ¥;_1 — 1 + Z. Let N; denote the number of neutral 
vertices at time t. As Z; vertices switched from neutral to live at time t, N; satisfies 
the recursion No =n — 1, MN; = Ni_1 — Z. Equivalently, as there are ¢ dead and 
Y; live vertices at time t, N; = n—t— Y;. Z; is found by checking N,_, pairs 
for adjacency. As these pairs have not yet been examined they remain adjacent with 
independent probability p. That is, 


Z, ~ B(M-1,p) ~ B(n — (t-1) —¥1-1,2). (11.8) 
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The graph branching rocess of Section 11.2 mirrors the above analysis until time 
T and then continues until time . This fictional continuation shall be useful in the 
analysis of C(v). The graph branching process is similar to a binomial branching 
process in that the Z; have binomial distributions but dissimilar in that the parameter 
N;—1 in the graph branching process depends on previous values Z;. 

As Ni = Ni_-1 — Zt, (11.8) yields N, ~ B(N;-1, 1 — p). By induction we find 
the distributions 


M~ B(n-1,(1—p)') for O<t<n. 


If 7’ = ¢ it is necessary (though not sufficient, due to fictitious continuation) that 
N; = n-—t. This yields the useful inequalities. 


Theorem 11.5.1 Jn G(n, p) 
Pr[|C(v)| = t] < Pr [B(n-1,(1—p)') =n-#] 
or, equivalently, 


Pr [|C(v)| = t] < Pr[B(n —1,1—(1-p)*) =t-1]. (11.9) 


An Alternate Analysis. The following analysis of C(v) on G(n, p) has been 
explored by van der Hofstad and Spencer (2006). Each w # v flips a coin, heads 
with probability p, repeatedly until getting a head. Let X,, denote that flip on which 
w gets a head. Suppose X,, = 7. Then w enters the BFS at time 7. (However, 
it may have missed the boat if the BFS has already terminated.) This reverses the 
usual randomness; we are here imagining the w # v trying to get into the BFS 
tree, rather than the BFS tree trying to expand by finding neutral vertices. Suppose 
t = |C(v)|. Every w ¥ v that is in C(v) must have entered by time t so X,, < f. 
Every w # v that is not in C(v) had t opportunities to enter C(v) and so X,, > t. 
Thus Pr [|C'(v)| = ¢] is at most the probability that X,, < t for preciselyt-—lw # v. 
For each w 4 v, Pr[X, = t] = 1 — (1 — p)! and these events are independent over 
w, yielding (11.9). In van der Hofstad and Spencer (2006) this analysis is extended 
to give more precise bounds on Pr [|C(v)| = ¢]. 


11.6 THE GRAPH AND POISSON PROCESSES COMPARED 


Set p = c/n. A key observation is that Z; ~ B(n — 1, c/n) approaches (in n) the 
Poisson distribution with mean c. Furthermore, in a more rough sense, the same holds 
for Z; as long as N;_1 ~ o(n) or, equivalently, the number of live and dead vertices 
is o(n). That is, the generation of C(v) mimics the Poisson branching process with 
mean c as long as the number of vertices found is not too large. This allows for 
a very accurate description in the very subcritical regime c < 1. But in the very 
supercritical regime c > 1 the relationship between the generation of C'(v) and the 
Poisson branching process breaks down. As the number N;_, of neutral vertices 


188 THE ERDOS-RENY! PHASE TRANSITION 


drops so does the expected number E [Z;] of vertices added to the queue. Eventually 
the drift of the walk Y; lowers from positive to negative, and this eventually causes 
the process to halt. We call this phenomenon the ecological limitation. Indeed, there 
must be an ecological limitation. The Poisson branching process becomes infinite 
with positive probability; the component C'(v) tautologically cannot be greater than 
n. 


Theorem 11.6.1 For any positive real c and any fixed integer k 
lim Pr [|C(v)| = k in G(n, c/n)| = Pr[T. =k} . 
mM CO 
Proof. Let Z?°, T?° and 22" , T9" denote the values in the Poisson branching process 
with parameter ¢ and the graph branching process with parameters n, p, respectively. 


Let I’ denote the set of k-tuples 7 = (z1,..., 2,) of nonnegative integers such that 
the recursion yo = 1, y, = yz-1 + 2 — Lhas y, > 0 fort < k and y, = 0. Then 


Pre Sk SS Pra? Sz Sk, 
Pr [T?? = kl 


Se Pr|Z Sz, 1 a =F 


where both sums are over Z € I. Fix such a Z. 
k 
Peg 751 sk) = [Pr [B(N",,p) = zi| : 
i=1 


As i, yi—1, 2 are fixed Z;_,; = n — O(1) and B(Z;_1,p) approaches the Poisson 
distribution. More precisely, 


lim Pr[B(Zy- 159) = 2;) = Pr[Zr’ =z): 


n—-0o 


Furthermore, as the products are of a fixed number of terms, 


lim Pr [2% = 4,1<i<k] =Pr(ZP =4,1<i<k]. 


nm- OO 


Proof [Theorem 11.4.2]. By Theorem 11.6.1, 


Pr (T?° =k] = lim Pr{|C(w)| =k] . 


where the second probability is in G(n, p) with p = c/n and v is an arbitrary vertex 
of that graph. There are (,,",) choices for C(v). On any particular such S' there is 
probability O(p*) = O(n-*) that G(n, p) has more than k — 1 edges. If G(n, p) has 
precisely k — 1 edges on S they must form a tree. There are k*~? such trees. Each 


occurs with probability p*~1(1 — p)(2)-Ft ~ pe! = ch-ly!-*. Thus the total 
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probability that G'(n, p) restricted to S forms a connected graph is ~ k*~2c*—1y)!-F, 
As a connected component, we must further have no edges between C'(v) and its 
complement; this has probability (1 — p)*("—*) ~ e~°*. Thus 


—ck k-1 
— pw (7% \pbn2ek—lp lobe nek _, (ch) EO? 
Pr(C(v) =k] G4) as | Nace il F 


as desired. |_| 


The graph branching process can be compared to the binomial branching process 
in both directions. An important cautionary note: the event dia > u in Theo- 
rem 11.6.2 (and similarly Teo > uin Theorem 11.6.3) includes the possibility 


that the binomial branching process is infinite. Indeed, in application this will be the 
critical term. 


Theorem 11.6.2 For any u 


Pr pee >ul] <Pr keane > ul : 


Proof. We modify the graph branching process by constantly replenishing the supply 
of neutral vertices. That is, when we pop the vertex w and there are n — 1 — s 
neutral vertices, we create s fictional vertices w’ and allow w, w’ to be adjacent with 
probability p. This gives a component of size ake the actual C(v) will be a subset 


of it. Thus T, pila dominates 777".. a 


Theorem 11.6.3 For any u 


Pe [os Sai) > Pe [eS ul 
Proof. We halt the graph branching process when the number of found (live plus 
dead) vertices reaches u. This does not affect the probability of finding at least u 
vertices. In this truncated graph process we diminish the number of neutral vertices 
ton — wu. That is, when we pop the vertex w and there are n — 1—s > n— neutral 
vertices, we select n — u of them and only allow adjacencies w,w’ to them. The 
truncated graph process dominates this truncated binomial n — u, p process and so 
has a greater or equal probability of reaching w. | 


The Poisson Approximation. We are working in the range p = O(n~'). There 
the binomial B(n — 1, p) distribution and the Poisson distribution with mean np are 
very close. The Poisson branching process is precisely understood and, we feel, the 
“purest” branching process. Our goal in this chapter is to give the reader a picture 
for the “why” of the various regimes. To do this we shall often avoid the technical 
calculations and simply assume that the binomial n — 1, p branching process is very 
close to the Poisson branching process with mean np. 
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11.7 THE PARAMETRIZATION EXPLAINED 


In the parametrization (11.1) for the critical window why is the exponent -4 as 
opposed to, say, —i or -2 or something completely different? In the experience of 
the authors this is the question most frequently asked about the Erdéds—Rényi phase 
transition. Here is a heuristic that may be helpful. 

Parametrize p = (1+ ¢)/n with e« = e(n) positive and approaching zero. We 
look for the following picture. Consider the Poisson branching process T = T?)... It 
is infinite with probability ~ 2¢, otherwise its probability of exceeding Ae~? drops 
exponentially in A. The graph branching process mimics the Poisson branching 
process as long as it is not too successful. The cases when the Poisson branching 
process is finite are mimicked, yielding components of size up to roughly e~?. The 
cases when the Poisson branching process is infinite are mimicked by components that 
“escape” until the ecological limitation sets in. These components all join together. 
They form a single component, the dominant component, of size 2en. 

In order for the above (admittedly rough) picture to hold there needs to be a dis- 
tinction between the small components, up to size e~2, and the dominant component 
of size 2en. That is, we need 2en >> €~2. This heuristic leads us toc = n~!/* as the 
breakpoint. When ¢ >> n~!/3 we have the distinction between small and dominant 
and are in the supercritical regime. When ¢ = O(n~‘/9) there is no effective analogy 
to the Poisson branching process being infinite, and there is no dominant component. 


11.8 THE SUBCRITICAL REGIMES 


Let p = c/n with c < 1. Theorem 11.6.2 gives 


Pr bee > uj <Pr ee > ul ‘ 


With the Poisson approximation, 
Pr [|C(v)| > u] < (1 + 0(1))Pr [Te > a. 


From (11.4) this drops exponentially in u. Taking u = K Inn for appropriately 
large K, Pr[|C(v)| > ul] < n71!°!. As this holds for each of the n vertices v, 
the probability that any v has |C(v)| > wu is less than nn~!°' — 0. That is, 
Ly = O(1nn) with probability tending to one. 

Let’s push this argument into the barely subcritical regime p = (1 — ¢)/n with 
e = An-'/3_ Let I, be the indicator random variable for C(v) having at least 
u vertices, u to be determined below. As above Theorem 11.6.2 and our Poisson 
approximation give the bound 


Pr [|C(v)| > u] < (1+ 0(1))Pr (Ty. > ul) . 
We now parametrize 


u= Ke7? Ind = Kn2/3)7? Ind. 
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For an appropriately large constant the bound (11.6) gives 
Pr Tice = ulis ee Ee 


Let X = }°,, J, be the number of vertices v in components of size at least u and let 
Y be the number of components of G(n, p) of size at least u. Linearity of expectation 
gives 

E[X] = nE[Iy] < ned73) = 2/8721, 


AsY < Xu"}, 
E[Y] <u! E[X] < K7'A-* = 0. 


With probability approaching one, Y = 0 and so 


Ly <u= Ke? nA = Kn?9)-7 Ind. 


11.9 THE SUPERCRITICAL REGIMES 


In the supercritical regimes there are two salient points about the giant or dominant 
component; First, it exists. Second, it is unique. Neither is trivial. 


The Very Supercritical Phase. We start with the very supercritical region, p = 
c/n, with c > 1 constant. The ideas here will carry into the barely supercritical 
region. Let y = y(c) be the positive real solution of the equation e~°Y = 1 — y. Let 
6 be an arbitrarily smail constant and let A be an appropriately large constant. Set 
S=Kinn, L~ = (y—6)nand L* = (y+)n. Calla component C(v) and its size 
|C(v)| small if |C(v)| < S, giant if L~ < |C(v)| < L* and awkward otherwise. 


No Middle Ground. We claim that the probability of having any awkward com- 
ponent is o(n~7°). (We could make ‘20’ arbitrarily large by changing K.) There 
are n choices for v and n choices for t = |C(v)|. Thus it suffices to show that 
for any v and for any awkward ¢ that Pr ||C(v)| = ¢] = o(n7!8). From Theo- 
rem 11.5.1 it suffices to bound Pr [B(n — 1,1 — (1 — c/n)') = t — 1]. We indicate 
the technical calculations. When t = o(n), 1 — (1 — c/n)! ~ en/t and c > 1 so 
Pr[B(n — 1,1 — (1 —c/n)*) < t — 1] is exponentially small in t. As t > K Inn 
this is polynomially small inn. When t ~ xn, 1 — (1 — c/n)’ ~ 1 ~ e~®. For 
x #y,1—e~™ £ x so the mean of the binomial is not near ¢ and the probability that 
it is equal to t is exponentially small in n. In all cases the bounds on Pr [|C'(v)} = ¢] 
follow from basic Chernoff bounds. 


Escape Probability. Set a = Pr[C(v) is not small]. [When this happens we 
like to think that the BFS on Gn, p) starting with root v has escaped an early death. ] 
Theorems 11.6.2 and 11.6.3 sandwich 


Pr Gs 2 S| Se = Pelee ae SIs 
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From our Poisson approximation both Pr [77?'", , > S| and Pr [T?"", , > S] are 
asymptotic to Pr[T, > S]. Thus a ~ Pr[T, >S]. As c is assumed fixed and 
So, 


a~Pr[T, > S]}~ Pr[T, =o] =y, 


with y as in (11.2). 

Because there is no middle ground, not small is the same as giant. C(v) is giant 
with probability ~ y. Thus the expected number of vertices in giant components is 
~ yn. Each giant component has size between (y — 6)n and (y + 6)n. Our goal 
is a single giant component of size ~ yn. We are almost there — but maybe with 
probability 3 there are two giant components. 


Sprinkling. Set p,; = n~°/?. (Any p; with n>? < p, < n~! would do here.) 
Let G; ~ G(n, p1) be selected independently from G ~ G(n, p) on the same vertex 
set and let Gt = GUG, so that Gt ~ G(n, pt) with p* = p+ pi — ppi. (We 
“sprinkle” the relatively few edges of G; on G to make Gt.) Suppose G(n, p) had 
more than one giant component and let V,, V2 be the vertex sets of two of those 
components. There are 2(n2) pairs {v,, v2} with v, € Vi, v2 € Vo. We have 
selected p; large enough so that with probability 1 — o(1) at least one of these 
pairs is in the sprinkling G;. Adding this edge merges components Vj, V2 into a 
component of size at least 2y(1 — 6)n in G*. We have selected p; small enough so 
that pt ~ p = c/n. The probability that Gt has a component so large, and hence 
awkward, is therefore o(n~2°). Hence the probability that G had more than one giant 
component is o(n~°), 

Finally, we make 6 arbitrarily small. G(n,p) has an expected number ~ yn of 
points in giant components and giant components all have size ~ yn. Furthermore, 
by the sprinkling argument, the contribution to this expectation from the possibility 
of G having more than one giant component is o(nn 2°), which is negligible. Thus 
with probability 1—o(1) there is precisely one giant component. This gives the salient 
features of the very supercritical phase. There is a giant component so DL, ~ yn. 
There is only one giant component and no middle ground so Lz < S$ = O(Inn). 

The sprinkling for complexity argument given below in the barely supercritical 
phase can easily be modified to show that the giant component has high complexity, 
indeed, complexity Q(n). 


The Barely Supercritical Phase. Set p = (1 + )/n with e = An~/3 and 
A = oo. Note e~? = A~?n?/3 < en. The analysis of the barely supercritical 
region becomes more difficult as A = \(n) approaches infinity more slowly. We shall 
add the simplifying assumption that \ >> Inn. Furthermore, we shall find somewhat 
weaker bounds than stated on Lo. 

Bollobas (1984) showed the existence of the dominant component when > 
Kv\inn, K constant. That paper was the first indication of the appropriate scaling 
for the critical window. Luczak (1990a) tightened the result to “best possible,” 
showing that if \ — +00 then the dominant component exists. 


THE SUPERCRITICAL REGIMES 193 


Let 6 be an arbitrarily small constant and let K be an appropriately large constant. 
Set S = Ke? Inn, L~ = (1 — 6)2en and Lt+ = (1+ 6)2en. Call a component 
C(v) and its size |C(v)| small if |C(v)| < S, dominant if L~ < |C(v)| < L* and 
awkward otherwise. 


No Middle Ground. We claim that the probability of having any awkward 
component is o(n~ 2°). (We could make ‘20” arbitrarily large by changing K.) There 
are n choices for v and n choices for t = |C(v)|. Thus it suffices to show that 
for any v and for any awkward ¢ that Pr [|C(v)| = t] = o(n~ 18). Again we bound 
Pr [B(n — 1,1 — (1 — p)t) =t — 1]. We indicate the technical calculations. Let pu 
and a? denote the mean and the variance of the binomial. Then pp = (n — 1)(1 — 
(1 — p)®) and, in this range, o? ~ yp. When t = o(ne) we estimate 1 — (1 — p)* 
by pt = t+ te. Then pp —t ~ —te and o? ~ t. This probability is roughly 
exp|—(te)?/2t] = exp[—te?/2]. As t > 9 the exponent is o(n~'*) for K > 36. [To 
push S down to Ke~? In X requires a finer bound on Pr [|C(v)| = t].] Now suppose 
t ~ xne, where x # 2. The ecological limitation now has an effect and we estimate 
1 — (1—p)' by pt — 572? so 


1 1 
pe—t~te— ae ~ (née) (« ~ 527) : 


[Observe that when x = 2 the mean of the binomial is very close to t and so we do 
not get a small bound on Pr {|C'(v)| = ¢]. This is natural when we consider that there 
will be a dominant component of size ~ 2en.] Again o? ~ t so the probability is 
exp[—2((ne)?/t)], which is extremely small. When t >> ne the probability is even 
smaller. 


Escape Probability. Set a = Pr[C(v) is not small]. Theorems 11.6.2 and 
11.6.3 sandwich 


Pr [ta | Sa Pe | ee 


The Poisson approximation for 7?" ,, is T42. As S >> ¢~?, bound (11.7) gives 


a<Pr[Tize > S|] ~ Pr[Ti4e = co] ~ 2e. 


Replacing n ~— 1 by n — S lowers the mean by ~ Sn~'. But Sn-'/e ~ 
(Inn)/(ne%) = \~3 Inn and we have made 4 large enough that this is o(1). That is, 


Sn~! = o(¢). Therefore T;’"g ,, is approximated by 7, 42—(¢) and 


a > Pr [Tisetotey 2S] ~ Pr [Tisetore) = 00] ~ 2e. 


a has been sandwiched and @ ~ 2e. 

Because there is no middle ground, not small is the same as dominant. C'(v) is 
dominant with probability ~ 2¢. Thus the expected number of vertices in dominant 
components is ~ 2ne. Each dominant component has size between (1 — 6)2ne and 
(1+6)2ne. As in the very supercritical case, we need worry about having more than 
one dominant component. 
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Sprinkling. Set p; = n~‘4/°. Let G; ~ G(n,p1) be selected independently 
from G ~ G(n,p) on the same vertex set and let G? = G UG; so that GT ~ 
G(n, pt) with pt = p+ pi — pp: = 1+€+0(e). Suppose G(n, p) had more than 
one giant component and let V,, V2 be the vertex sets of two of those components. 
There are > n4/° pairs {v1, v2} with v1 € Vi, v2 € V2. With probability 1 + o(1) at 
least one of these pairs is in the sprinkling G,. Adding this edge merges components 
V, Vz into a component of size at least (1 — 6)4en in G*. The probability G* has 
such a large, and hence awkward, component is o(n~2°). Thus the probability G 
had had two (or more) dominant components is o(n~ 7°). Taking 6 arbitrarily small, 
as in the supercritical case, G has with probability 1 — o(1) precisely one dominant 
component. Thus L; ~ 2ne and, as there is no middle ground, Lz < Ke~? Inn. 


Sprinkling for Complexity. Take p; = (1 + ¢/2)/n and pz ~ ¢/2n so that 
pitpe-pip2 = (1 +e)/n. Let Gy] ~ G(n,p1), G2 ~ G(n, p2) and G3 = G, UG»2 
so that G3 ~ G(n,(1+¢)/n). G1,G 3 will have dominant components V;, V3 of 
sizes ~ ne and ~ 2ne. As G3 has no middle ground in its component sizes, V; C V3. 
Now the sprinkling G2 adds ~ p2(") ~ ne?/2 edges internal to V;. Thus V3 will 
have complexity at least ne?/2 = \*/2, which approaches infinity. 


11.10 THE CRITICAL WINDOW 


We now fix areal \ and set p = 1/n + \n~4/3, There has been massive study of this 
critical window, Luczak (1990b) and the monumental Janson, Knuth, Luczak and 
Pittel (1993) being only two examples. Calculations in this regime are remarkably 
delicate. 

Fix c > O and let X be the number of tree components of size k = cn?/3. Then 


E[X] = oO) Ke 2yk-1 (1 ahaa) (=) 


Recall that In(1 + x) = x — $x? + O(a?) and watch the terms cancel! 


(aber (3): 


so that 
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Considering p*~! = n}~*(1 + An71/3)k-1, 
1 
(k —1)In(1 + An) = kAn73 — 5 +o(1). 


Last but not least, 


1 my 


za 2) - 
In(1 — p) = —p+ O(n SS Sag t Oe 2h 
and 
k Kk? 2/3 
k(n —k)+ 9 ~(k-I=kn—- > + O(n ie 
so that 


k 
[k(n — &) + (3) -(e-1)] In(1 — p) = SL ~ a te tll). 
Putting it all together, 


nk Rk—-2 


PANS kkV/2rknk-} A 


A _ nk 5/2 (Qq)—'/2eA ; 


where 


oe Ak Pe k? Ak XC? 


eft as 
2n 6 Ae 2 on ae 2 


The k and n terms cancel and we can give A the intriguing form 


(A —c) — 3 


A= A(c) = 6 


Writing & in terms of n then yields 
E [X} oe n—2/3 Ale) 5/2 (aq) 1/2 ; 


For any particular such k, E[X] — 0 but if we sum k between cn?/3 and (c+de)n2/3 
we multiply by n2/3dc. Going to the limit gives an integral: For any fixed a, b, let 
X be the number of tree components of size between an?/* and bn?/3. Then 


b 
Jim E[X] ai eA) 0-5/2 (9¢)-1/2 de, 

The large components are not all trees. Wright (1977) proved that for fixed 
1 > 0 there are asymptotically ck*-2+@'/2) connected graphs on k points with 
k —1+l edges, where cp = 1, c) = \/7/8 and c; was given by a specific recurrence. 
Asymptotically in 1, ¢, = [7!/20+0Q)), The calculation for X, the number of 
such components on k vertices, leads to extra factors of ¢,k*!/? and n~', which gives 
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cc?'/2, For fixed a,b, \,1 the number X of components of size between an?/? 


and bn2/5 with | — 1 more edges than vertices satisfies 


NCO 


b 
im B [xO] =] MO e-5/2(am)- 1/2 (eye!) de, 
a 


and letting X* be the total number of components of size between an?/3 and bn?/3 


b 
lim E[X*} = eA) 5/2 (24)—1/2 9(c) de, 


> 
NCO a 


where a: 
g(c) oe Soc 
i=0 


a sum convergent for all c. A component of size ~ cn?/> will have probability 
cc'/? /9(c) of having complexity 1, independent of . As lim,.o g(c) = 1, most 
components of size en?/3, « < 1, are trees but as c gets bigger the distribution on / 
moves inexorably higher. 


An Overview. For any fixed , the sizes of the largest components are of the form 
en?/ with a distribution over the constant. This distribution has support the positive 
reals. Thus, for example, for \ = —4 there is some positive limiting probability that 
the largest component is bigger than 10n?/* and for \ = +4 there is some positive 
limiting probability that the largest component is smaller than 0.1n?/%, though both 
these probabilities are miniscule. The c~°/? term dominates the integral as c — 0*, 
reflecting the notion that for any fixed A there should be many components of size near 
en?/3 for € = e(X) appropriately small. When ) is large negative (e.g., A = —4) the 
largest component is likely to be en?/?, € small, and there will be many components 
of nearly that size. The nontree components will be a negligible fraction of the 
tree components. When 4 is large positive (e.g., 1 = +4) the dominant component 
will have begun to emerge. The largest component is likely to be ~ 2An?/% and of 
moderately high (not zero or one) complexity and the second largest component will 
be considerably smaller and simple. 

Now consider the evolution of G(n, p) in terms of A. Suppose that at a given A 
there are components of size c,n?/? and cyn?/3. When we move from A to A + dA 
there is a probability cjc. dX that they will merge. Components have a peculiar 
gravitation in which the probability of merging is proportional to their sizes. With 
probability (c?/2)d, there will be a new internal edge in a component of size ¢,n?/3 
so that large components rarely remain trees. Simultaneously, big components are 
eating up other vertices. 

With \ = —4, say, we have feudalism. Many small components (castles) are each 
vying to be the largest. As increases the components increase in size and a few 
large components (nations) emerge. An already large France has much better chances 
of becoming larger than a smaller Andorra. The largest components tend strongly to 
merge and by \ = +4 it is very likely that a dominant component, a Roman Empire, 
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has emerged. With high probability this component is nevermore challenged for 
supremacy but continues absorbing smaller components until full connectivity — 
One World — is achieved. 


11.11 ANALOGIES TO CLASSICAL PERCOLATION THEORY 


The study of percolation has involved the intense efforts of both mathematicians and 
physicists for many years. A central object of that study has been bond percolation 
on Z4, as described below. Here we explore, without proofs, the fruitful analogies 
between that percolation and the Erdés—Rényi phase transition. A classic text in this 
field is Percolation by Grimmett (1999) and we shall follow its treatment. 

Let d > 2. (All parameters below shall depend on the choice of d.) Let Z4, as 
usual, represent the set of d = (a1,...,@q) with a; integers. The d-dimensional cubic 
lattice, written L%, is that graph with vertices Z%, two vertices @, b being adjacent 
if they agree on d — 1 coordinates and differ by one on the other coordinate. Let 
p € [0,1]. The random subgraph L4(p) contains each edge of L% (and no others) 
with independent probability p. We let C(d@) denote the connected component of 
L(p) containing the vertex @ We generally examine C(0) as, by symmetry, all 
C(@) look the same. [In Grimmett (1999) and elsewhere the edges of L4 are called 
bonds and they are open with probability p and closed otherwise. The word cluster 
is used in place of connected component.] Naturally, as p becomes larger L4(p) will 
have more adjacencies. There is a critical probability, denoted by p,, at which L@(p) 
undergoes a macroscopic change. 


e For p < pg, the subcritical region, all connected components are finite. 
e For p > pe, the supercritical region, there is precisely one infinite component. 


e For p = pe, at the critical point, the situation is particularly delicate, as 
discussed below. 


The constant probabilities of bond percolation correspond to the parametrized 
probabilities p = c/n in Erdés—Rényi’s G(n,p). The value c = 1 is then the 
critical probability in the Erdés—Rényi model. The infinite component in the bond 
percolation model is analogous to giant components, components of size (7), in 
the Erdés—Rényi model. The finite components in the bond percolation model are 
analogous to components of size O(1n n) in the Erdés—Rényi model. 


The uniqueness of the infinite component in bond percolation was an open question 
(though the physicists “knew” it was true!) for many years. It was proved by 
Aizenman, Kesten and Newman (1987) and the Book Proof is given by Burton and 
Keane (1989). This corresponds to the uniqueness of the giant component in G(n, p). 

In the bond percolation model there are only three choices for p: it can be less 
than, greater than, or equal to p,. The barely subcritical and barely supercritical 
phases of the Erdds—Rényi model correspond to an asymptotic study of the bond 
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percolation model as p approaches p, from below and from above, respectively. This 
study is done through the use of critical exponents as described below. 

Set 6(p) = Pr ic (0) is infinite]. For p < pe, 9(p) = 0 as with probability one 
there are no infinite components. For p > p,, 6(p) > 0. This corresponds to the 
infinite component having positive density, strengthening the analogy to the giant 
components of the Erdos-Rényi model. When p is barely greater than p, there will 
be an infinite component but its density will be very small. The critical exponent @ 
is that real number so that 


O(p) = (p — pe)? +2 as p > pt . 


[As mathematicians, we are aware that 0(p) could behave erratically as p — pt and 
@ might not exist. This holds for all critical exponents we discuss. For a physicist, 
there is no doubt that the critical exponents do exist, and they can tell you the values to 
a few decimal places!] Analogously, in the Erdés—Rényi model (c) is the proportion 
of points in the giant component, that y = y(c) > 0 satisfying (11.2). From (11.5), 
y(1 +e) ~ 2e ase — 0+. Therefore 6 = 1. 

The susceptibility, denoted by x(p) (not to be confused with chromatic number), 


is given by x(p) = E [ic]. For p > pe, X(p) = 00 as with positive probability 


C(0) is infinite. For p < pe, x(p) is finite and y(p) + oo as p > p>. That 
the susceptibility approaches infinity at the same critical value for which an infinite 
component appears is not at all obvious and was one of the great developments of 
the field, due independently to Aizenman and Barsky (1987) and Men’shikov (1986). 
When p is barely less than p,, y(p) will be finite but large. The critical number ¥ is 
that real number so that 


x(p) = (pe — p74) as p — pz . 


Analogously, in the Erdds—Rényi model we examine E ||C'(v)|] in G(n, (1 — €)/n). 
In the subcritical region this is well mirrored by T_., the total size of a subcritical 
Poisson branching process. We find E[T,_-] by looking at each generation. There 
is one root Eve, who has an expected number 1 — € children. They behave similarly 
and so Eve has an expected number (1 — ¢)? grandchildren. This continues; there 
are an expected number (1 — ¢)* nodes in the ith generation so that 

oe) 

E(T-.] =) (1-)' =e? 

i=0 

precisely. Therefore y = 1. 
While x(p) is infinite in the supercritical region we can examine the “finite portion” 

of L“(p). The finite susceptibility x/ is given by 


xi (p) =E [ico | C@) is finite] . 


When p is barely greater than p., x/(p) will be finite but large. The critical number 
+’ is the real number satisfying 


xf (p) = (p— pe) Tt asp = pe . 
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The Erdés-Rényi analogue is E [|C(w)|] in G(n, (1 + €)/n), conditional on v not 
being in the giant component. In G(n, (1 + €)/n), |C(v)| has basically distribution 
Ti?» with the value 77? = oo corresponding to being in the giant component. 
The finite analogue then corresponds to T??., conditional on it being finite. The 
probability that T/??. is finite approaches one as e > 0+. The Poisson branching 
processes TY? ., 77°, have nearly the same finite distribution. Conditioning on v not 
being in the giant component, |C'(v)| then behaves like T7°,. Therefore 7’ = 1. 

At the critical value p = po, all components are finite. The distribution of |C'(0)| 
will have a heavy tail. The critical number 6 is that real number so that at p = p, 


Pr lic) > s| = s~ V5) as § 5 00. 


For the Erdés—Rényi analogue we consider |C(v)| in G(n,1/n). One needs to be 
cautious about the double limit. For any fixed s, 


lim Pr [|C(v)| > s] = Pr[TP? > s] = O(s-/?) 


from (11.3). Therefore 6 = 2. 

We further examine the gap exponent, denoted by A. In the subcritical region 
the distribution of |C'(0)| drops off exponentially. For each k > 1 it has a finite kth 
moment. The hypothetical quantity A is such that 


B[ic@ | 
B[ICO)F| 


The belief is that A does not depend on the choice of k. In the supercritical region 
the belief is that the same asymptotics hold when the infinite component is erased. 
More precisely, the belief is that 


= (De Sip) So") . 


E leo | C(O) is finite] 


E [los | CW) is finite| 


forall & > 1. Inthe Erdés—Rényi analogue the distribution of C'(v) in G(n, (1—e)/n) 
mirrors that of T7°.. [The supercritical G(n, (1 + €)/n), with its giant component 
erased, behaves similarly.] From Section 11.4, Pr [T?°.. = s] drops like s~3/? until 
k reaches Q(<~) when it begins its exponential drop-off. The region of exponential 
drop-off has negligible effect on the finite moments. The Ath moment of 77°. is 
basically the sum of s~3/2s* for s = O(e~*), which is of order (e~?)*+1/2, or 
e—?k-1_ The ratio of the (k + 1)st and kth moments is then O(c~”). Therefore 
A=, 
For bond percolation in Z? define the triangle function 


Tp) = >- Pr [6 a] Pr [Gog] Pre at, 


z,geEZ4 
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where © <> y means that , ¥ lie in the same component. In Aizenman and Newman 
(1984) the following condition was introduced. 


The Triangle Condition: T(p.) < ©. 


They showed that when the triangle condition holds some of the conditions for mean 
field theory (as discussed below) are valid. For the Erdos—Rényi percolation analogue 
we fix a vertex v of G(n, p) and define the discrete triangle function 


T(p) = >> Priv a] Priv oy) Priv oy] : 
ry 


The critical probability p, is replaced by p = n~'. Finiteness is replaced by the 
boundedness giving the following. 


The Discrete Triangle Condition: T(p) = O(1). 


The contribution to T'(p) when two or three of v, x, y are equal is easily bounded, 
leaving the contribution from all triples v, x, y of distinct vertices. As all pairs behave 
the same and there are (n — 1)(n — 2) ~ n? terms, 


T(p) ~ O(1) + ?Pr[v af? 


and 
Priv oa} = So Pr[lC(v)| = — ~n So ePr(IC(o)| = ¢) . 


We know that Pr [|C'(v)| = t] behaves like ¢~°/? until t reaches @(n?/%) and then 
drops off exponentially. Ignoring constants, 


So ePr(iC@)l|=@=Of So tt-3/? | = @((n2/9)¥/?) 


t=O(n?/3) 


Now Pr [v < 2] = Q(n7?/%). [Basically, the main contribution to Pr [v — z] comes 
when v lies in a component of size O(n?/*), even though that rarely occurs.) The 
triangle condition does hold as 


T(p) =O) + O(n?) O(n-2/3)3 = O(1). 


The discrete triangle condition does not hold in the barely supercritical region. There 
Pr [v < 2] is dominated by the probability that both v, w lie in the dominant com- 
ponent. As the dominant component has size >> n?/3, Pr[v © 2] >> n7?/%, and 
T(p) > 1. This is not mere serendipity. Rather, the boundedness of T(p) provides a 
natural boundary between the critical window and the barely supercritical region for 
discrete random structures. This connection is explored in depth in Borgs, Chayes, 
van der Hofstad, Slade and Spencer (2005) and the recent lecture notes of Slade 
(2006). 
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Hara and Slade (1990) [see also the survey by Hara and Slade (1994)] proved 
that the triangle condition holds in the bond percolation model for sufficiently high 
dimensions d. [More precisely, they showed that T'(p) could be made very small by 
taking p slightly less than p.. Their argument works for d > 19 and for all d > 6 
with a somewhat different model. It is strongly believed that the condition does hold 
for all d > 6.] Building on that, they found that the critical exponent values @ = 1, 
y= = 2,6 = 2 and A = 2 hold for those d. Mathematical physicists have a term 
mean field, which, quoting Grimmett, “permits several interpretations depending on 
context.” A commonly held requirement is that the critical exponents have the values 
given above. Thus bond percolation for d > 19 is regarded as exhibiting mean field 
behavior. Using the analogues described above it seems reasonable to say that the 
Erd6s—Rényi model exhibits mean field behavior. 


11.12 EXERCISES 


1. Consider the Poisson branching model with mean c = 1 and root Eve. For 
n > 3 let A, be the event that Eve has precisely two children, Dana and Fan, 
and that the total tree size JT’ = n. Let X be the size of the subtree with root 
Dana. For each j > 1 find limp_so. Pr [X = j | An]. Find an asymptotic 
formula for Pr [n/3 < X < 2n/3]. 


2. Consider the binomial branching model with parameters m,p and mp > 1. 
Set y = y(m,p) = Pr[T = ov]. Give an implicit equation for y analogous 
to (11.2). With 7 fixed set mp = (1 + €). Find 


lim y(m, p) 


«0+ € 


3. Letc > 1. Let Z;,4 = 1,2,... be independent Poisson variables with mean c. 
For a > 1 consider the walk defined by initial condition ¥; = a and recursion 
Y, = Y;_-1 + Z; — 1 for t > 2. Use Chernoff bounds to show 


lim S~ Pr[¥; < 0] =0. 


t>2 


Use this to show that the random walk defined by initial condition Yo = 1 and 
recursion Y; = Y;-1 + Z — 1 for t > 1 has a positive probability of being 
positive for all ¢. 


4. An Openended Computer Experiment. Begin with vertices 1,...,n (n = 10° 
is very quick when done right) and no edges. Each round pick two random 
vertices and add an edge between them. Use a UNION-FIND algorithm to keep 
track of the components and the component sizes. Parametrize round number 
E by E/(3) = 1/n + An~‘4/3 and concentrate on the region —4 <  < 44. 
Update the ten largest component sizes, noting particularly when two of the 
ten largest components merge. Watch the barely subcritical picture at \ = —4 
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turn into a barely supercritical picture at 1 = +4 as the bulk of the moderate 
size components merge to form a dominant component. 


THE PROBABILISTIC LENS: 
The Rich Get Richer 


Consider two bins, each of which initially has one ball. At each time u = 1,2,... 
we add one ball to one of the bins. The ball is placed randomly, in proportion to 
the square of the number of balls already in the bin. [For example, if the bins have 
5 balls and 3 balls, respectively, the next ball is placed in the bin with 5 balls with 
probability 25/(25 + 9).] 


Theorem 1 With probability one, one of the bins will get all but a finite number of 
the balls. 


Proof. We move to a continuous time model. Let X; be independent random 
variables, X; having the exponential distribution with mean i~?. (That is, X; has 
density function i2e-% fort > 0.) At time zero the first bin has one ball. It receives 
its second ball at time X,. In general, it receives its ith ball time X; after receiving 
its (¢ — 1)st ball. Let X/ also be independent exponential distributions with mean 
i~*, independently chosen from the X;. The second bin receives its balls according 
to the X/. The process ends when an infinite number of balls have been placed. The 
fictitious continuation, of defining the X;, Xj for all i > 1, shall be helpful in the 
analysis. 

We use two basic properties of exponential distributions. Both are easy calculus 
exercises. 


Proposition 2 Let X be exponential with mean p and let a > 0. Then X — a, 
conditional on X > a, is also exponential with mean 1. 


This is often called the forgetfulness property. 
203 
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Proposition 3 Let X, X' be independent exponentials with means |, v, respectively. 
Then , 
Ve 


. f on _ 
Pr [min{X, X’} = X] = roi et 


The continuous time process mirrors the sequential process. Clearly the first ball 
is equally likely to go into either of the two bins. Suppose at some time t > 0 the first 
(say) bin has just received its ith ball and the second bin last received its jth ball at 
time t’ < t. (When the second bin has not yet received its second ball set 7 = 1 and 
t’ = 0.) The waiting time for the first bin is then X;. The waiting time for the second 
was X, at time t’. By the forgetfulness property its conditional waiting time at time 
tis Xj, exponential with mean j~?. The next ball goes into the first bin if and only 
if min{ X;, X}} = X;, which occurs with probability i?/(i? + j?) as desired. 

Let T = S072, Xi, T’ = O32, X} be the total times for the bins to receive (under 
fictitious continuation) an infinite number of balls. As E[X;] = E[X/] = i~? and 
(critically!) San i-? converges, both T, 7’ have finite means and so are finite with 
probability one. As sums of independent continuous distributions Pr [T = T’] = 0. 
Suppose T < T’, the other case being identical. At time T the first bin has received 
an infinite number of balls. The second bin has not. Therefore the second bin has 
received only a finite number of balls! a 
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Circuit Complexity 


It is not knowledge, but the act of learning, not possession but the act of getting 
there, which grants the greatest enjoyment. When I have clarified and exhausted 
a subject, then I turn away from it, in order to go into darkness again; the 
never-satisfied man is so strange — if he has completed a structure then it is not 
in order to dwell in it peacefully, but in order to begin another. I imagine the 
world conqueror must feel thus, who, after one kingdom is scarcely conquered, 
stretches out his arms for another. 


~ Karl Friedrich Gauss 


12.1 PRELIMINARIES 


A Boolean function f = f(21,...,%n) on the n variables 11, r2,..., 2p, is simply 
a function f : {0,1}" — {0,1}. In particular, 0,1, 21) A--- Aan, %1 V +++ Van 
and x; ®:-- @ x», denote, as usual, the two constant functions, the AND function 
(whose value is 1 iff x; = 1 for all 7), the OR function (whose value is 0 iff 7; = 0 
for all 2) and the parity function (whose value is 0 iff an even number of variables 
x; is 1), respectively. For a function f, we let f = f @ 1 denote its complement 
Not f. The functions x; and %; are called atoms. In this section we consider the 
problem of computing various Boolean functions efficiently. A circuit is a directed, 
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acyclic graph, with a special vertex with no outgoing edges called the output vertex. 
Every vertex is labeled by a Boolean function of its immediate parents and the 
vertices with no parents (i.e., those with no ingoing edges) are labeled either by 
one of the variables x; or by a constant 0 or 1. For every assignment of binary 
values to each variable x; one can compute, recursively, the corresponding value of 
each vertex of the circuit by applying the corresponding function labeling it to the 
already computed values of its parents. We say that the circuit computes the function 
f = f(x1,...,2n) if for each x; € {0,1}, the corresponding value of the output 
vertex of the circuit equals f(21,...,2n). For example, Figure 12.1 presents a circuit 
computing f(x), 22,23) = (1 ® (w2 Az3)) Aa. 


Output 


Fig. 12.1 A binary circuit for f(@1, 72,23) = (a1 @ (2 Az3)) AZ. 


If every fanout in a circuit is at most one (i.e., the corresponding graph is a tree) 
the circuit is called a formula. If every fanin in a circuit is at most two the circuit 
is called a binary circuit. Therefore the circuit in Figure 12.1 is binary, but it is 
not a formula. The size of a circuit is the number of vertices in it and its depth is 
the maximum length (number of edges) of a directed path in it. The binary circuit 
complexity of a Boolean function is the size of the smallest binary circuit computing 
it. An easy counting argument shows that for large n the binary circuit complexity 
of almost all the functions of n variables is at least (1 + o(1))2"/n. This is because 
the number of binary circuits of size s on n variables can be shown to be less than 
(c(s + n))*, whereas the total number of Boolean functions on n variables is 22”. 
On the other hand, there is no known nonlinear, not to mention exponential (in 
n), lower bound for the binary circuit complexity of any “explicit” function. By 
“explicit” here we mean an NP-function, that is, one of a family { fn, }i>1 of Boolean 
functions, where f,, has n; variables, n; — oo, and there is a nondeterministic 
Turing machine that, given n; and 21,...,%n,, can decide in (nondeterministic) 
polynomial time (in n;) if fp,(%1,...,2n,) = 1. [An example for such a family 
is the (n/2)-clique function; here n; = ey the n; variables 71,...,%p, represent 
the edges of a graph on i vertices and f,,(z1,.--,2n,) = 1 iff the corresponding 
graph contains a clique on at least 7/2 vertices.] Any nonpolynomial lower bound 
for the binary circuit complexity of an explicit function would imply (among other 
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things) that P # NP and thus solve the arguably most important open problem in 
theoretical computer science. Unfortunately, at the moment, the best known lower 
bound for the binary circuit complexity of an explicit function of n variables is only 
3n [see Blum (1984), Paul (1977)]. However, several nontrivial lower bounds are 
known when we impose certain restrictions on the structure of the circuits. Most 
of the known proofs of these bounds rely heavily on probabilistic methods. In this 
chapter we describe some of these results. We note that there are many additional 
beautiful known results about circuit complexity; see, for example, Wegener (1987) 
and Karchmer and Wigderson (1990), but those included here are not only among 
the crucial ones, but also represent the elegant methods used in this field. Since most 
results in this chapter are asymptotic we assume, throughout the chapter, whenever 
it is needed, that the number of variables we have is sufficiently large. 


12.2 RANDOM RESTRICTIONS AND BOUNDED-DEPTH CIRCUITS 


Let us call a Boolean function G a t-AND-OR if it can be written as an AND of 
an arbitrary number of functions, each being an OR of at most ¢ atoms; that is, 
G=G,A--:A Gy, where G; = yi V--: V Yia;, @ < t and each y;; is an atom. 
Similarly, we call a Boolean function an s-OR-AND, if it can be written as an OR of 
AND gates each containing at most s atoms. A minterm of a function is a minimal 
assignment of values to some of the variables that forces the function to be 1. Its size 
is the number of variables whose values are set. Note that a function is an s-OR- 
AND if and only if each of its minterms is of size at most s. A restriction is a map 
p of the set of indices {1,...,n} to the set {0, 1, *}. The restriction of the function 
G = G(x1,...,%n) by p, denoted by G|,, is the Boolean function obtained from G 
by setting the value of each x; for i € p~'{0,1} to p(i), and leaving each x; for 
j € p_'(*) as a variable. Thus, for example, if G(x1, 72,73) = (41 A x2) V x3 and 
p(1) = 0 p(2) = p(3) = *, then G|, = x3. For 0 < p < 1, a random p-restriction 
is a random restriction p defined by choosing for each 1 < i < n independently the 
value of p(z) according to the following distribution: 


Pr[o(t) =*] =p, Pr[p(t) = 0] = Pr[o(t) = 1) = (1 — p)/2. (12.1) 


Improving the results of Furst, Saxe and Sipser (1984), Ajtai (1983) and Yao 
(1985), Hastad (1988) proved the following result, which is very useful in establishing 
lower bounds for bounded-depth circuits. 


Lemma 12.2.1 [The Switching Lemma] Let G = G(21,..., 27) be a t-AND-OR, 
that is, G = Gy AGo A--- A Gy, where each G; is an OR of at most t atoms. Let p 
be the random restriction defined by (12.1). Then 


Pr [G|, is not an (s — 1)-OR-AND] 
= Pr[G|, has a minterm of size > s] < (5pt)*. 
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Proof. Let F, be the event that G|, has a minterm of size at least s. To bound 
Pr [E,], we prove a stronger result; for any Boolean function F, 


Pr [Es | Flp =1] < (5pt)®. (12.2) 


Here we agree that if the condition is unsatisfied then the conditional probability is 
0. Lemma 12.2.1 is obtained from (12.2) by taking fF = 1. We prove (12.2) by 
induction on w. For w = 0, G = 1 and there is nothing to prove. Assuming (12.2) 
holds whenever the number of G’; is less than w, we prove it for w. PutG = G, AG", 
where G* = G2 A--- A Gy, and let E* be the event that G*|, has a minterm of 
size at least s. By interchanging, if necessary, some of the variables with their 
complements we may assume, for convenience, that G] = Vier x;, where |T| < t. 
Either Gi|, = 1 or G,|, #1. In the former case, FE, holds if and only if E} holds 
and hence, by induction, 


Pr [Es | Fp =1,Gi|p = 1] = Pr [Ez | (F AGi)|p = 1] < (Spt)®. (12.3) 


The case G;|, # 1 requires more work. In this case, any minterm of Gj, must 
assign a value 1 to at least one z;, fori € T. For a nonempty Y C T and fora 
function o : Y — {0,1} that is not identically 0, let E,(Y,o) be the event that G|, 
has a minterm of size at least s which assigns the value o(i) to x; for each i € Y and 
does not assign any additional values to variables x; with 7 € 7’. By the preceding 
remark, 


Pr [Es | Fp =1,Gilp #1] < >) Pr[E,(¥,0) | Flp=1,Gilp 21]. (12.4) 


Y,o 


Observe that the condition G|, # 1 means precisely that p(z) € {0, *} for alli ¢ T 
and hence, for each 7 € T, 


2 
MA Gln aap ree 


Thus, if |Y| = y, 


Pr [o(Y) =*| Gil, #1] < (<2) 


The further condition F’|, = 1 can only decrease this probability. This can be shown 
using the FKG Inequality (see Chapter 6). It can also be shown directly as follows. 
For any fixed p’: N — Y — {0,1,*}, where N = {1,...,n}, we claim that 


2p \% 
Y)= = _y='|) < {— P 
Pr [p(Y) = «| Flo = 1,G1|p #1, plw_y és (72) 


Indeed, the given p’ has a unique extension p with p(Y) = *. If that p does not 
satisfy the above conditions, then the conditional probability is zero. If it does, then 
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so do all extensions p with p(i) € {0,*} for i € Y, and so the inequality holds in 
this case too. As this holds for all fixed p’ we conclude that indeed 


2 ¥y 
Pr [p(Y) = *| Flp =1,Gil, #1] < (2) < (2p)¥. (12.5) 


Let p’ : T — {0,*} satisfy p(Y) = * and consider all possible restrictions 
p satisfying p|,, = p’. Under this condition, p may be considered as a random 
restriction on N ~T. The event F'|, = 1 reduces to the event F'| ply = 1, where F 
is the AND of all functions obtained from F by substituting the values of x; according 
to p’ for those i € T with p’(i) = 0, and by taking all possibilities for all the other 
variables x; for 7 € T. If the event E,(Y,o) occurs then G*|,, has a minterm of 
size at least s — y that does not contain any variable x; with i © T — Y. But this 
happens if and only if G| plw—r has a minterm of size at least s — y, where G is the 
function obtained from G* by substituting the values of x; for 7 € Y according to 
a, the values of x; fori € T — Y and p’(i) = 0 according to p’ and by removing all 
the variables x; with k € T’— Y and p’(k) = *. Denoting this event by F,_, we 
can apply induction and obtain 


Pr [E.(Y, a) | F\p = 1,Gi|p F 1, plr = p'| ED 2) |Es-y | Fl, = 1| < (Spt)? 


Since any p with F|, = 1, Gi|, = 1, p(Y) = * must have p|7 = p’ for some p’ of 
this form, and since the event £,(Y, 0) may occur only if o(Y ) = * we conclude that 


Pr [Es(Y,0) | Flo =1,Gilp #1, 0(Y) = *] < (5pt)*®, 
and, by (12.5), 
Pr [E.(Y,c) | Flp = 1, Gil, #1] 
Pr[p(¥) =*| Flp =1,Gi|, 41 
‘Pr [E3(Y,0) | Flo =1,Gi|p #1, 0(Y) = *] 
(2p)* (Spt)? . 


Substituting in (12.4) and using the fact that |T| < ¢ and that 


lA 


t foe) 
(2¥~1)2¥ 2 (4/5) 2 ays 4 
Ke ——— = -l-- 1 
> ! <i sire es a 


we obtain 


Pr [Es | Flp =1,Gilp #1] 
IT | 


> @ (2 — 1) 2p)"(6p8)*-¥ < (nt) peo) Gi 


y=l 


= (6pt) DQ" - 1) 


y=1 


1A 


< (5pt)*. 
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This, together with (12.3), gives 
Pr [E, | Flp = 1] < (5pt)* 
completing the induction and the proof. a 


By taking the complement of the function G in Lemma 12.2.1 and applying De 
Morgan’s rules one clearly obtains its dual form: If G is a t-OR-AND and p is the 
random restriction given by (12.1) then 


Pr [G|, is not an (s — 1)-AND-OR] < (5pt)°. 


We now describe an application of the switching lemma that supplies a lower bound 
to the size of circuits of small depth that compute the parity function x; @--- @ Zn. 
We consider circuits in which the vertices are arranged in levels; those in the first 
level are atoms (i.e., variables or their complements) and each other gate is either 
an Or or an AND of an arbitrary number of vertices from the previous level. We 
assume that the gates in each level are either all AND gates or all OR gates, and 
that the levels alternate between AND levels and OR levels. A circuit of this form 
is called a C(s, s’, d, t)-circuit if it contains at most s gates, at most s’ of which are 
above the second level, its depth is at most d and the fanin of each gate in its second 
level is at most t. Thus, for example, the circuit that computes the parity function by 
computing an Or of the 2”~! terms wf! A--- A x&", where (€1,..., €n) ranges over 
all even binary vectors and x5‘ = x; @ ;, isa C(2”~! + 1, 1, 2, n)-circuit. 


Theorem 12.2.2 Let f = f(x1,...,%n) be a function and let C be a C(o0, 8, d,t)- 
circuit computing f, where s(1/2)' < 0.5. Then either f or its complement f has a 
minterm of size at most n — n/[2(10t)4~] + t. 


Proof. Let us apply to C’, repeatedly, d — 2 times a random (1/10¢)-restriction. 
Each of these random restrictions, when applied to any bottom subcircuit of depth 2, 
transforms it by Lemma 12.2.1 with probability at least 1 — (1/2)' from a t-OR-AND 
to a t-AND-OR (or conversely). If all these transformations succeed we can merge 
the new AND gates with these from the level above them and obtain a circuit with 
a smaller depth. As the total size of the circuit is at most s and s(1/2)’ < 0.5, 
we conclude that with probability at least i, all transformations succeed and C is 
transformed into a C(oo, 1, 2,¢)-circuit. Each variable x;, independently, is still a 
variable (i.e., has not been assigned a value) with probability 1/(10¢)¢~?. Thus 
the number of remaining variables is a binomial random variable with expectation 
n/(10t)*~? and a little smaller variance. By the standard estimates for binomial 
distributions (see Appendix A) the probability that at least n/2(10t)¢~? variables 
are still variables is more than + Therefore, with positive probability, at most 
n — n/2(10t)4~? of the variables have been fixed and the resulting restriction of f 
has a C(oo, 1, 2, t)-circuit; that is, its value can be fixed by assigning values to at 
most ¢ additional variables. This completes the proof. a 
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Corollary 12.2.3 For any d > 2, there is no 


C (<x ; /Qt/10)nEY ae) -circuit 


that computes the parity function f(%1,...,%n) = 21 ®--- Pp. 


Proof. Assuming there is such a circuit we obtain, by Theorem 12.2.2, that the value 
of f can be fixed by assigning values to at most n — gni/(d-1) + fni/(a-L) <n 
variables. This is false, and hence there is no such circuit. | 


The estimate in Corollary 12.2.3 is, in fact, nearly the best possible. Since every 
C(s, s’,d,t)-circuit can be transformed into a C((t + 1)s,s,d + 1,2)-circuit (by 
replacing each atom by an OR or AND of two copies of itself), Corollary 12.2.3 
easily implies that the depth d of any C(s, s’,d,t)-circuit of polynomial size that 
computes the parity of n bits is at least Q(log n/ log log n). This lower bound is also 
optimal. 


12.3 MORE ON BOUNDED-DEPTH CIRCUITS 


In the previous section we saw that the parity function is hard to compute in small 
depth using AND, OR and NOT gates. It turns out that even if we allow the use 
of parity gates (in addition to the AND, OR and Not gates) there are still some 
relatively simple functions that are hard to compute. Such a result was first proved by 
Razborov (1987). His method was modified and strengthened by Smolensky (1987). 
For an integer k > 2, let MOD,(21,22,...,%n) be the Boolean function whose 
value is 1 iff > z; #0 (mod k). Smolensky showed that for every two powers p 
and g of distinct primes, the function MOD, cannot be computed in a bounded-depth 
polynomial-size circuit that uses AND, OR, NOT and MOD, gates. Here we present 
the special case of this result in which g = 3 and p = 2. 

Let C be an arbitrary circuit of depth d and size s consisting of AND, OR, NoT 
and MoD3 gates. A crucial fact, due to Razborov, is the assertion that the output of C 
can be approximated quite well (depending on d and s) by a polynomial of relatively 
small degree over GF (3). This is proved by applying the probabilistic method as 
follows. Let us replace each gate of the circuit C by an approximate polynomial 
operation, according to the following rules, which guarantee that in each vertex in 
the new circuit we compute a polynomial over G'F'(3), whose values are all 0 or 1 
(whenever the input is a 0-1 input). 


(i) Each Nor gate 7 is replaced by the polynomial gate (1 — y). 


(ii) Each MOD gate MoD3(y1, .--, Ym) is replaced by the polynomial gate (y; + 
tat + tm)? 


The rule for replacement of OR and AND gates is a little more complicated. 
Observe that in the two previous cases (i) and (ii) there was no approximation; the 
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new gates compute precisely what the old ones did, for all possible Boolean values 
of the variables. This can, in principle, be done here too. An AND gate y; A---AYym 
should simply be replaced by the product y; --+ Ym. An OR gate y; V--- V Ym can 
then be computed by De Morgan’s rules. Since y1 V---V Ym = (Yr A+++ A Ym) and 
7 is realized by (1 — y), this would give 


The trouble is that this procedure would increase the degree of our polynomials too 
much. Hence we need to be a little more tricky. Let @ be an integer, to be chosen 
later. Given an OR gate y; V --- V ym, we choose £ random subsets J;,..., Ie of 
{1,...,m}, where for each 1 < i < @ and for each 1 < j < m independently 
Pr [j € I;] = 1/2. Observe that for each fixed 7, 1 < 7 < @, the sum (37 ,¢;, y;)? 
over GF (3) is certainly 0 if y; V--- V Ym = 0 and is 1 with probability at least 1/2 
ify) V---V Ym = 1. Hence, if we compute the OR function of the @ expressions 
ears: y;)°,1 <i < &, this function is 0 if y; V-++VY¥m = Oand is 1 with probability 
at least 1 — (1/2)° if y; V--- V ym = 1. We thus compute the Or and write it as a 
polynomial, in the way explained in equation (12.6). This gives 


2 
£ 


1—|] i- (5° 9; : (12.7) 


i=1 jel; 


Therefore in our new circuit we replace each OR gate by an approximation 
polynomial gate of the form described in (12.7). Once we have an approximation to 
an OR gate we can obtain the corresponding one for an AND gate by applying De 
Morgan’s rules. Since y; A--» A Ym = (Yi V «+: V Ym) we replace each AND gate 
of the form y; A-+- A Ym by 


£ 


I]]1-|S5G-9)] ]- (12.8) 


i=1 jel; 


Observe that the polynomials in (12.7) and (12.8) are both of degree at most 2¢. 
Given the original circuit C of depth d and size s, we can now replace all its gates 
by our approximative polynomial gates and get a new circuit C'P, which depends 
on all the random choices made in each replacement of each of the AND/OR gates. 
The new circuit CP computes a polynomial P(z1,...,%n) of degree at most (2¢)?. 
Moreover, for every fixed Boolean assignment to 21, %2,..., Zn, the probability that 
all the new gates compute exactly what the corresponding gates in C computed is at 
least 1 — s/2°. Therefore the expected number of inputs on which P(x1,...,2n) is 
equal to the output of C is at least 2”(1 — s/2°). We have thus proved the following. 


Lemma 12.3.1 For any circuit C of depth d and size s on n Boolean variables that 
uses NOT, OR, AND and MODs gates and for any integer ¢, there is a polynomial 
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P = P(a,...,%n) of degree at most (20)? over GF (3) whose value is equal to the 
output of C on at least 2"(1 — s/2°) inputs. 


In order to apply this lemma for obtaining lower bounds for the size of any circuit 
of the above type that computes the parity function, we need the following additional 
combinatorial result. 


Lemma 12.3.2 For n > 20, there is no polynomial P(x,,...,2n) over GF'(3) of 
degree at most \/n which is equal to the parity of x1,...,Up fora set S of at least 
0.9 - 2” distinct binary vectors (%,...,2n). 


Proof. Suppose this is false, and suppose S c {0,1}", |S| > 0.9- 2” and 
P(a1,...,2n) = 1 ®-+: @ Lp for all (%1,...,2%n) € S. Define a polynomial 


Q = Q(y1,---5Yn) by 
Q(y1,---sYn} = Ply + 2,...,Yn +2) —-—2 


and let 


T = {(yi,---.yn) € {1,-1}" : (yi +2,...,4n +2) € S}, 


where all additions are in GF'(3). Clearly Q has degree at most \/n and satisfies 
Q(yi,---, Yn) = TL, yi for all (m1, ---, yn) € T. Let now G = G(y,..-, yn) be 
an arbitrary function from T to GF'(3). Extend it in an arbitrary way to a function from 
(GF(3))” to GF(3) and write this function as a polynomial in n variables. [Trivially, 
any function from (GF'(3))" to GF‘(3) is a polynomial. This follows from the fact 
that it is a linear combination of functions of the form []j"_, (yi — €:)(yi — & — 1), 
where €; € GF(3).] Replace each occurrence of y? in this polynomial by 1 to obtain 
a multilinear polynomial G that agrees with G on T. Now replace each monomial 
Hieu vis where |U| > n/2+ J/n/2 by [Tigy yi > Q(y1,-+-, Yn) and replace this 
new polynomial by a multilinear one, G' , again by replacing each y? by 1. Since for 
yi © {+1}, Tig %- Ti % = Nie ye G' is equal to G on T and its degree is at 


— n Vr n n 
most n/2+,/n/2. However, the number of possible G’ is gio ke) Sa 
whereas the number of possible G is 3!7! > 3°-92". This is impossible, and hence 
the assertion of the lemma holds. iF 


Corollary 12.3.3 There is no circuit of depth d and size s < 0.1- 20/2)" com. 
puting the parity of x1, 22,...,£p using NOT, AND, OR and MOD3 gates. 


Proof. Suppose this is false and let C be such a circuit. Put @ = ani /2d, By 
Lemma 12.3.1 there is a polynomial P = P(x1,...,2,) over GF(3), whose degree 
is at most (20)? = ,/n, which is equal to the parity of z1,...,2, on at least 
2 ¢ _ 3/21/2904) > 0.9-2” inputs. This contradicts Lemma 12.3.2 and hence 
completes the proof. LS 
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12.4 MONOTONE CIRCUITS 


A Boolean function f = f(x1,...,2n,) is monotone if f(#1,...,2n) = 1 and 
xi < y imply f(yi,-.-,Yn) = 1. A binary monotone circuit is a binary circuit 
that contains only binary AND and OR gates. It is easy to see that a function 
is monotone if and only if there is a binary monotone circuit that computes it. 
The monotone complexity of a monotone function is the smallest size of a binary 
monotone circuit that computes it. Until 1985, the largest known lower bound for 
the monotone complexity of a monotone NP-function of n variables was 4n. This 
was considerably improved in the fundamental paper of Razborov (1985), where a 
bound of n('°8") to the k-clique function (which is 1 iff a given graph contains 
a clique of size k) is established. Shortly afterwards, Andreev (1985) used similar 
methods to obtain an exponential lower bound to a somewhat unnatural NP-function. 
Alon and Boppana (1987) strengthened the combinatorial arguments of Razborov 
and proved an exponential lower bound for the monotone circuit complexity of the 
clique function. In this section we describe a special case of this bound by showing 
that there are no linear size monotone circuits that decide if a given graph contains a 
triangle. Although this result is much weaker than the ones stated above, it illustrates 
nicely all the probabilistic considerations in the more complicated proofs and avoids 
some of the combinatorial subtleties, whose detailed proofs can be found in the above 
mentioned papers. 

Put n = eae and let 21,%2,...,2, be n Boolean variables representing the 
edges of a graph on the set of vertices {1,2,...,m}. Let T = T(a,..., 2) be 
the monotone Boolean function whose value is 1 if the corresponding graph contains 
a triangle. Clearly there is a binary monotone circuit of size O(m*) computing T.. 
Thus the following theorem is tight, up to a polylogarithmic factor. 


Theorem 12.4.1 The monotone circuit complexity of T is at least (m3 / log* m). 


Before we present the proof of this theorem we introduce some notation and prove 
a simple lemma. For any Boolean function f = f(x1,...,2n) define 


ACH} = (@isssssta) © {0,1} 2 fie sen) = 1}. 
Clearly A(f Vg) = A(f) U A(g) and A(f Ag) = A(f) M A(g). Let C be a 


monotone circuit of size s computing the function f = f(#1,...,2,). Clearly 
C supplies a monotone straight-line program of length s computing f; that is, 
a sequence of functions 21,%2,...,2n, fi,.--,fs, where f, = f and each fj, 


for 1 < % < s, is either an OR or an AND of two of the previous functions. 
By applying the operation A we obtain a sequence A(C’) of subsets of {0,1}”: 

Sy SS Ags ies Aca = Ap Aigo Ags where Az, = Ay), Ay = AQ) 
and each A; for 1 < i < s is either a union or an intersection of two of the 
previous subsets. Let us replace the sequence A(C) by an approximating sequence 
MG) 2 Mig = Mg = Ag jceig Mg = Mae = Any, Migs My defined 
by replacing the union and intersection operations in A(C’) by the approximating 
operations L! and I, respectively. The exact definition of these two operations will 
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be given later, in such a way that for all admissible MM and L the inclusions 
MULDMUL and MNLECMNEL (12.9) 
will hold. Thus M,, = A,, for all 1 <7 < n and if for some 1 < 7 < s we have 


A; = AgU A, then M; = Met Mg, whereas if A; = Ag Ax then M; = MpMMxg. 
In the former case put 67, = M; — (Mz U M,) and 64, = 0, and in the latter case put 


64 = (MeO My) — M; and &, = 0. 
Lemma 12.4.2 For all members M; of M(C), 


A\U&oMCAUD&. (12.10) 


jSt JS 


Proof. We apply induction on i. For i < 0, M; = A; and thus (12.10) holds. 
Assuming (12.10) holds for all MM; with 7 < 2 we prove it for 7. If A; = Ag U Ag, 
then, by the induction hypothesis, 


M,; = MpUM, U6 C ApUARU| J & =A;UL&, 


Gt jst 
and 
M, = MeUM, > MeUM, 
> (A\U&) Ul A\ UH) 24\U 4, 
Gx GSk j<t 
as needed. If A; = Ag M Ax, the proof is similar. | 


Lemma 12.4.2 holds for any choice of the operations LJ and 1 that satisfies (12.9). 
In order to prove Theorem 12.4.1 we define these operations as follows. Put r = 
100 log? m. For any set R of at most r edges on V = {1,2,...,mb}, let [R] 
denote the set of all graphs on V containing at least one edge of R. In particular, 
[0] is the empty set. We also let [*] denote the set of all graphs. The elements 
of M(C) will all have the form [R] or [*]. Note that A,, = M,, is simply the 
set [R], where R is a singleton containing the appropriate single edge. For two 
sets R, and Rez of at most r edges each, we define [Ry] [Re] = [Rin Re), 
[Ri] [*] = [Ry] and [x] 7 [*] = [x]. Similarly, if |R; U Re| < r we define 
[Ry] U [Re] = [Ri U Rel, whereas if | Ry U Ro| > r then [Ry] U [Re] = [x]. 
Finally [*] U [Ri] = [*] U [*] = [*]. 


Proof [Theorem 12.4.1]. We now prove Theorem 12.4.1 by showing that there is no 
monotone circuit of size s < (3) /2r? computing the function T. Indeed, suppose 
this is false and let C be such a circuit. Let M(C) = Mz,,,...,Mz,,™Mi,...,Ms 
be an approximating sequence of length s obtained from C’ as described above. By 
Lemma 12.4.2, 


A(T)\ J 6h OM, ¢ A(T) UL J &. (12.11) 


Jsgs JSs 
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We consider two possible cases. 


I. 


M, = [R], where |R| <r. 


Let us choose a random triangle A on {1, 2,...,m}. Clearly 
,— 2 1 
Pree Oe, 
(3) 2 


This is because if 52, 4 0, then 64, = ({R,] N[R2]) \ [Ri M Re] for some 
two sets of edges Ri, Ro, each of cardinality at most r. The only triangles 
in this difference are those containing an edge from A, and another edge 
from Ry (and no edge of both). Since there are at most r? such triangles the 
last inequality follows. Since s < (’;)/2r? the last two inequalities imply 


that Pr E ¢ M,andA ZU 


Since this triangle belongs to A(T) this contradicts (12.11), showing that this 
case is impossible. 


Pee 63 > 0 and thus there is such a triangle A. 


sg els 


Let B be a random spanning complete bipartite graph on V = {1,2,...,m} 
obtained by coloring each vertex in V randomly and independently by 0 or 1 
and taking all edges connecting vertices with distinct colors. Since M, is the 
set of all graphs, B € M,. Also B ¢ A(T), as it contains no triangle. We 
claim that for every fixed 7, 7 < s, 


Pr [B c 53 <Q-Vil2 < at (12.12) 
m 


Indeed, if 57, A 0, then 57, = [*] \ ({Ri] U[Re]), where |R; U R2| > r. 
Consider the graph whose set of edges is R; U Ro. Let d be its maximum 
degree. By Vizing’s Theorem the set of its edges can be partitioned into at 
most d + 1 matchings. Thus either d > ,/r/2 or the size of the maximum 
matching in this graph is at least \/r/2. It follows that our graph contains a set 


of k = \/r/2 edges e1,..., e, that form either a star or a matching. In each of 
these two cases Pr [e; € B] = 5 and these events are mutually independent. 
Hence 


Pr(B ¢ [Ri] U [Ral] < 2-477, 


implying (12.12). Note that a similar estimate can be established without 
Vizing’s Theorem by observing that B does not belong to [Ri] U [Re] if and 
only if the vertices in any connected component of the graph whose edges are 
R, U Re belong to the same color class of B. 


FORMULAE 217 


Since s < (2) /2r? < m°, inequality (12.12) implies that there is a bipartite 
B such thatB eé M,,B ¢ A(T) and B¢ Uses 6?,. This contradicts (12.11), 
shows that this case is impossible as well and hence completes the proof of 
Theorem 12.4.1. 


12.5 FORMULAE 


Recall that a formula is a circuit in which every fanout is at most 1. Unlike in the 
case of circuits, there are known superlinear lower bounds for the minimum size of 
formulae computing various explicit NP-functions over the full binary basis. For a 
Boolean function f = f(x1,...,2%n), let us denote by L(f) the minimum number of 
AND and OR gates in a formula that uses AND, OR and NOT gates and computes f. 
By De Morgan’s rules we may assume that all NOT gates appear in the first level of 
this formula. We conclude this chapter with a simple result of Subbotovskaya (1961), 
which implies that for the parity function f = 21 ®--- @ rn, L(f) > O(n9/2). This 
bound was improved later by Khrapchenko (1971) to L(f) = n? — 1. However, we 
present here only the weaker Q(n3/ 2) lower bound, not only because it demonstrates, 
once more, the power of relatively simple probabilistic arguments, but also because a 
modification of this proof enabled Andreev (1987) to obtain an Q (n°/?/(log n)O™) 
lower bound for L(g) for another NP-function g = g(x1,...,2%7). Hastad (1998) 
later improved this lower bound to Q(n®-°)). This is at present the largest known 
lower bound for the formula complexity of an NP-function of n variables over a 
complete basis. 

The method of Subbotovskaya is based on random restrictions similar to the ones 
used in Section 12.2. The main lemma is the following. 


Lemma 12.5.1 Let f = f(x1,...,2n) be a nonatom Boolean function of n vari- 
ables. Then there is ani, 1 <i <n, andane € {0,1} such that for the function 
g = f(@1,..., U1, €, Lig1,---; Ln) of n — 1 variables obtained from f by substi- 


tuting x; = €, the following inequality holds: 
3 1 3/2 
wo+ns(i-Z)am+ys(i-2) Gury, 


Proof. Fix a formula F computing f with 1 = L(f) AND and Or gates. F can 
be represented by a binary tree each of whose / + 1 leaves is labeled by an atom z; 
or Xj. Let us choose, randomly, a variable z;, 1 < 2 < n, according to a uniform 
distribution, and assign to it a random binary value ¢ € {0,1}. When we substitute 
the values ¢ and 1 —¢ to x; and 7, respectively, the number of leaves in F’ is reduced; 
the expected number of leaves omitted in this manner is (J + 1)/n. However, further 
reduction may occur. 
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Indeed, suppose a leaf is labeled x; and it feeds, say, an AND gate x; A A in F. 
Observe that we may assume that the variable x; does not appear in the subformula 
H, as otherwise F can be simplified by substituting 2; = 1 in H. If x; = ¢ = 0, 
then H can be deleted once we substitute the value for x;, thus further decreasing the 
number of leaves. 

Since the behavior of this effect is similar for an OR gate (and also for 7; instead 
of x;), it follows that the expected number of additional leaves omitted is at least 
(1+1)/2n. Hence the expected number of remaining leaves in the simplified formula 
is at most (1 + 1)(1 — 3/2n), as claimed. Hi 


By repeatedly applying Lemma 12.5.1 we obtain the following. 


Corollary 12.5.2 If f = f(a1,...,2%n) and L(f) < (n/k)9/? — 1, then one can 
assign values to n — k variables so that the resulting function g is an atom. 


Proof. Repeated application of Lemma 12.5.1 n — k times yields a g with 
n l 3/2 
(Hoo) +0)s TE (2-3) +1) = mE) +0 <1. 
t=k4+1 


Hence g is either x; or Z; for some 7. | 


Corollary 12.5.3 For the parity function f = 2, ®---O@n, 


L(f) > oe sf 


12.6 EXERCISES 


1. Show that there exists a constant c such that the number of binary Boolean 
circuits of size s on n variables is at most (c(s + n))*. 


2. Let f be a Boolean formula in the n variables 71, 22,...,22n, where f is an 
AND of an arbitrary (finite) number of clauses; each clause is an OR of 10 
literals, where each literal is either a variable or its negation, and suppose each 
variable appears (negated or unnegated) in at most 10 clauses. Prove that f is 
satisfiable. 


3. (*) Prove that there is a bounded-depth, polynomial-size, monotone circuit of 
n Boolean inputs x1, 2%2,...,@, computing a function f whose value is 1 if 
1% > 2/2 + n/ log n and is 0 if ST, a < n/2 — n/logn. 


THE PROBABILISTIC LENS: 
Maximal Antichains 


A family F of subsets of {1,...,7} is called an antichain if no set of F is contained 
in another. 


Theorem 1 Let F be an antichain. Then 


1 
—~ <1. 


ACF (4) 7 


Proof. Let o be a uniformly chosen permutation of {1,...,} and set 


C=ttoQ) 4 SS 0st nt 


(The cases i = 0 andi = n give 0 € C, and {1,...,n} € Cg, respectively.) Define 
a random variable 


X =|FNC,|. 
We decompose 
x= 0 Xa. 
AGF 


where X 4 is the indicator random variable for A € C. Then 
1 
E[X4] = Pr[A €C,] =, 
(V4) 
since C, contains precisely one set of size | A|, which is distributed uniformly among 
the |A|-sets. By linearity of expectation, 
1 
AcF \|Al 
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For any o, C, forms a chain — every pair of sets is comparable. Since ¥ is an 
antichain we must have X = |F OC,| < 1. Thus E[X] <1. a 


Corollary 2 [Sperner’s Theorem] Let F be an antichain. Then 


FIs (ial) 


Proof. The function () is maximized at z = |n/2] so that 
oom : > FI 
AGF \A\) (\n72)) 
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Discrepancy 


The mystery, as well as the glory, of mathematics lies not so much in the fact that 
abstract theories do turn out to be useful in solving problems but in that wonder 
of wonders, the fact that a theory meant for solving one type of problem is often 
the only way of solving problems of entirely different kinds, problems for which 
the theory was not intended. These coincidences occur so frequently that they 
must belong to the essence of mathematics. 


— Gian-Carlo Rota 


13.1. BASICS 


Suppose we are given a finite family of finite sets. Our object is to color the underlying 
points red and blue so that all of the sets have nearly the same number of red and blue 
points. It may be that our cause is hopeless — if the family consists of all subsets 
of a given set 22 then regardless of the coloring some set, either the red or the blue 
points, will have size at least half that of 2 and be monochromatic. In the other 
extreme, should the sets of the family be disjoint then it is trivial to color so that all 
sets have the same number of red and blue points or, at worst if the cardinality is odd, 
the number of red and blue points differ by only one. The discrepancy will measure 
how good a coloring we may find. 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
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To be formal, let a family A of subsets of 2 be given. Rather than using red and 
blue we consider colorings as maps 


x: 2 {-1,4+1}. 


For any A C 2 we set 


x(A) = $2 x(a). 


acA 
Define the discrepancy of .A with respect to x by 


dise(A, x) = max |x(A)| 


and the discrepancy of A by 
dise(.A) = min disc(A, x) . 


x:2—{-1,41 


Other equivalent definitions of discrepancy reveal its geometric aspects. Let 
A = {S),...,Sm}, Q = {1,...,n} and let B = [b,;] be the m x n incidence 
matrix: 6,; = lif 7 € S;, otherwise b;; = 0. A coloring x may be associated with 
the vector u = (x(1),...,x(m)) € {-1,4+1}” so that Bu? = (y(S1),-.-,x(Sim)) 
and 


dise(A) = as |Bul |oo 
we{—L, +13” 


where |u|, is the L°°-norm, the maximal absolute value of the coordinates. Similarly, 
letting v; denote the jth column vector of B (the profile of point 7) 


disc(A) = min] tu +---+vplo, 


where the minimum ranges over all 2” choices of sign. 

We will generally be concerned with upper bounds to the discrepancy. Unraveling 
the definitions, disc(A) < K if and only if there exists a coloring y for which 
|x(A)| < # for all A € A. Naturally, we try the random coloring. 


Theorem 13.1.1 Let A be a family of n subsets of an m-set Q. Then 
disc(A) < 2m In(2n). 


Proof. Let y : 2 — {-—1,+1} be random. For A Cc Q let X,4 be the indicator 
random variable for |x(A)| > a, where we set a = \/2m1n(2n). If | A] = a then 
x(A) has distribution S,, so by Theorem A.1.1 


E [Xa] = Pr{lx(A)| > a] < 2608/24 < 26° /2™ = 1/0 
by our propitious choice of a. Let X be the number of A € A with |x(A)| > a so 


that 
X= s OF 
AEA 
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and linearity of expectation gives 


B[X] = > B[X4] < |A(/n) = 1. 
AEA 


Thus for some y we must have X = 0. This means disc(A, x) < a and therefore 
disc(.A) < a. a 


13.2 SIX STANDARD DEVIATIONS SUFFICE 


When A has both n sets and n points Theorem 13.1.1 gives 


disc(A) = O(\/nIn(n)). 


This is improved by the following result. Its proof resembles that of the main result 
of Beck (1981). The approach via entropy was suggested by R. Boppana. 


Theorem 13.2.1 [Spencer (1985b)] Let A be a family of n subsets of an n-element 
set Q. Then 
disc(.A) < 6/n. 


With y : 2 — {-1,+1} random, A € A, x(A) has zero mean and variance at 
most ,/n. If |x(A)| > 6,/n then x(A) is at least six standard deviations off the mean. 
The probability of this occurring is very small but a fixed positive constant and the 
number of sets A € A is going to infinity. In fact, arandom x almost always will not 
work. The specific constant 6 (actually 5.32) was the result of specific calculations 
that could certainly be further improved and will not concern us here. Rather, we 
show Theorem 13.2.1 with the constant 11. 

A map x : 2 — {—1,0, +1} will be called a partial coloring. When x(a) = 0 
we say a is uncolored. We define (A) as before. 


Lemma 13.2.2 Let A be a family of n subsets of an n-set Q. Then there is a partial 
coloring x with at most 10~°n points uncolored such that 


ix(A)| < 10V/n 
forallAEa. 


Here the values 10 and 10~° are not the best possible. The significant point is that 
they are absolute constants. Label the sets of A by A1,..., A, for convenience. Let 


x: 2 {-1,4+1} 
be random. For 1 < i < n define 


x(Ai) 
20./n - 


b; = nearest integer to 


224 DISCREPANCY 


For example, b; = 0 when —10,/n < x(Ai) < 10\/n and b; = —3 when —70,/n < 
x(A;) < —50,/n. From Theorem A.1.1 (as in Theorem 13.1.1), 

Pr[b; =O] > 1—2e7*°, 

Pr(bs=1]) = Pr[b; =—-1])<e™, 

Pr(b; =2} = Pr|b; =—-2]|<e-*™, 


and, in general, 

Pr [b; = s] = Pr [b; = —s| < ere. 
Now we bound the entropy H(b;). This important concept is explored more fully in 
Section 15.7. Letting p; = Pr [bi = J], 


+co 


H(b;)) = S$) —p;loga(p;) 


j=-oO 
(1 — 2e7*°)[— logy (1 — 2e7~*°)| + 2e~*°[— logy e~ 9] 
4-2e~*°°[— logs e7 4] ee 


lA 


The infinite sum clearly converges and is strongly dominated by the second term. 
Calculation gives 
H(b;) <e=3x 1077. 


Now consider the n-tuple (b,...,b,). Of course, there may be correlation among 
the b;. Indeed, if S; and S; are nearly equal then 6; and b; will usually be equal. But 
by Proposition 15.7.2 entropy is subadditive. Hence 


H((b1,.--,bn)) < > Hi) <en. 


If a random variable Z assumes no value with probability greater than 2‘ then 
H(Z) > t. In contrapositive form, there is a particular n-tuple (s1,..., $n) so that 


PE OyaciaOe) = (bigias 8a) 2 


Our probability space was composed of the 2” possible colorings x, all equally likely. 
Thus, shifting to counting, there is a set C’ consisting of at least 2~-©)” colorings 
x: 2 —> {—1, +1}, all having the same value (b),...,5n). 

Let us think of the class C of all colorings x : 2 —>+ {—1,+1} as the Hamming 
cube {—1, +1}” endowed with the Hamming metric 


A(x, x’) = lta: x(a) # x’(a)}H. 
Kleitman (1966b) has proved that if D C C and 


riz > (") 


i<r 
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with r < n/2 then D has diameter at least 2r. That is, the set of a given size with 
minimal diameter is the ball. (D has diameter at least r trivially, which would suffice 
to prove Lemma 13.2.2 and Theorem 13.2.1 with weaker values for the constants.) 


Proof. In our case we may take r = an as long asa < 5 and 


9H (a) < gi-e , 
Calculation gives that we may take a = $(1 — 10~°) with room to spare. [The 
Taylor series expansion give 


1 2 
H({=-2)~1-—~2? 
€E s) In2” 
for x small.] Thus C’ has diameter at least n(1 — 10~°). Let x1, x2 € C’ be at 


maximal distance. We set 
x= X1 — X2 
oer Soe 


x is a partial coloring of 2. x(a) = 0 if and only if x(a) = y2(a), which occurs 
for n — p(x1, X2) < 10~°n coordinates a. Finally, and crucially, foreach 1 <i <n 
the colorings x1, x2 yield the same value b;, which means that y1(A;) and x2(A;) 
lie on a common interval of length 20,/n. Thus 


Ay) = i 
ix(4i)| = AAD 224) | < roy, 
as desired. ‘. 


Theorem 13.2.1 requires a coloring of all points whereas Lemma 13.2.2 leaves 
10~-°n points uncolored. The idea now is to iterate the procedure of Lemma 13.2.2, 
coloring all but, say, 10~!8n of the uncolored points on the second coloration. We 
cannot apply Lemma 13.2.2 directly since we have an asymmetric situation with n 
sets and only 10~°n points. 


Lemma 13.2.3 Let A be a family of n subsets of an r-set Q with r < 107°n. Then 
there is a partial coloring x of 2 with at most 10~*°r points uncolored so that 


Ix(A)| < 10Vr-V/In(n/r) 


forall AE A. 
Proof. We outline the argument which leaves room to spare. Let Aj,..., An denote 
the sets of A. Let x : 2 — {+1} be random. For 1 < i < n define 


x{Ai) 


20,/r/in(n/r) 


b; = nearest integer to 
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Now the probability that b; = 1 is less than (r/n)°°. The entropy H(b;) is dominated 
by this term and is less than 


2(6)" as ()")] cot 


The entropy of (b;,...,b,) is then less than 10~1°°r; one finds nearly antipodal 
X1,X2 with the same b’s and takes y = (yi — Y2)/2 as before. a 


Proof [Theorem 13.2.1]. Apply Lemma 13.2.2 to find a partial coloring ’ and then 
apply Lemma 13.2.3 repeatedly on the remaining uncolored points giving x7, x°,... 
until all points have been colored. Let denote the final coloring. For any A € A, 


x(A) = x'(A) + x*(A) += 
so that 


Ix(A)| < 10/n + 10V10-9nv In 109 
+10V10-49nv In 1049 + 10V10-89nvV In 1089 + --- . 


Removing the common ,/n term gives a clearly convergent infinite series, strongly 
dominated by the first term so that 


\x(A)| < 11/n 


with room to spare. a 


Suppose that A consists of n sets on r points and r < n. We can apply 
Lemma 13.2.3 repeatedly (first applying Lemma 13.2.2 if r > 10~9n) to give a 


coloring x with 
disc(A, x) < KVrvV/In(n/r), 


where K is an absolute constant. As long as r = n'~°) this improves the random 
coloring result of Theorem 13.1.1. 


13.3 LINEAR AND HEREDITARY DISCREPANCY 


We now suppose that A has more points than sets. We write A = {Aj,..., An} and 
Q = {1,...,m} and assume m > n. Note that disc(A) < K is equivalent to the 
existence of a set S; namely, S = {7 : x(j) = +1}, with |S M A| within K/2 of 
|A|/2 for all A € A. We define the linear discrepancy lindisc(A) by 


la et eRe 


iG A 


lindisc(A) =  =-max min max 
Piy+-sPm€[0,1] €1,.--,€m €{0,1} AEA 
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The upper bound lindisc(A) < K means that given any pi,...,Dm there is a 
“simultaneous roundoff” €;,...,€m so that, with S = {7 : €; = 1}, |SM Al is within 
K of the weighted sum 7. 4 p; for all A € A. Taking all pj = 4, the upper bound 
implies disc(A) < 2K. But lindisc(A) < K is much stronger. It implies, taking 
all p; = §, the existence of an S with all |S 9 A| within K of |A|/3, and much 
more. Linear discrepancy and its companion hereditary discrepancy defined below 
have been developed in Lovdsz, Spencer and Vesztergombi (1986). For X Cc Q let 
A|x denote the restriction of A to X, that is, the family {AMX : A € A}. The next 
result “reduces” the bounding of disc(.A) when there are more points than sets to the 
bounding of lindisc{.A) when the points do not outnumber the sets. 


Theorem 13.3.1 Let A be a family of n sets on m points with m > n. Suppose that 
lindisc(A|x) < K for every subset X of at most n points. Then lindisc(.A) < K. 


Proof. Let p1,...,2m € [0,1] be given. We define a reduction process. Call index 
j fixed if p; € {0,1}, otherwise call it floating, and let F denote the set of floating 
indices. If |F'| < n then halt. Otherwise, let y;, 7 € F’, be a nonzero solution to the 
homogeneous system 


> yj = 9, AEA. 


jeEAnF 
Such a solution exists since there are more variables (|F'|) than equations (n) and 
may be found by standard techniques of linear algebra. Now set 
v= { py t+rAyj, JEF, 
Dj; j¢F, 


where we let \ be the real number of least absolute value so that for some j € F the 
value p’; becomes zero or one. Critically, 


Sop = doa +r SY w= Sop; (13.1) 


jeA jeA jE ANF jeA 


for all A € A. Now iterate this process with the new p’,. At each iteration at least 
one floating 7 becomes fixed and so the process eventually halts at some pj, ..., p7,. 
Let X be the set of floating j at this point. Then |X| < n. By assumption there exist 
€j,j € X so that 


> De = tj <K, AEA. 
jEANX 
For j ¢ X sete; = pj. As (13.1) holds at each iteration, 


P= DP 


jEA jEA 
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and hence 


ei - 6) =|D2@i-f) + DD WF -6)) <K 


jeA jeA jEANX 


for all A € A. a 


We now define the hereditary discrepancy herdisc(.A) by 
herdisc(A) = ers disc(.A|x) . 


Example. Let 2 = {1,...,7} and let A consist of all intervals [i,j] = {¢,i + 
1,...,j} with 1 <i <7 <n. Then disc(A) = 1 as we may color 2 alternately 
+1 and —1. But also herdisc(A) = 1. For given any X C Q, say, with elements 
21 <2q<--+ <2,, we may color X alternately by x(a,) = (—1)*. For any set 
[i, 7] € A the elements of [2, 7] 1 X are alternately colored. 


Theorem 13.3.2 lindisc(.A) < herdisc(A). 


Proof. Set K = herdisc(A). Let A be defined on 2 = {1,...,m} and let 
P1,-+-,;Pm € [0,1] be given. First let us assume that all p,; have finite expansions 
when written in base two. Let T’ be the minimal integer so that all p;27 € Z. Let J 
be the set of i for which p; has a one in the Tth digit of its binary expansion, that is, 
for which p;27—! ¢ Z. As disc(A| 7) < K there exist e; € {—1, +1}, so that 


JEINA 

for all A € A. Write p; = pe Now set 
pt) = py” ifj Zé J, 
d pe +e27T iffe J. 


(T-1) (F) 


That is, the D; are the “roundoffs” of the p; 


2) € Z. For any A € A, 


in the T'th place. Note that all 


T- T cn = 
Lar A) =| Yo ta] s 27K. 
JEA JEINA 


Iterate this procedure, finding pe} ‘afi py , p. All pee € Zsoall py € 
{0,1} and 


T T 
FA? — a1] < 7] oh 1] < ae xk, 
jEA 


jJEA t=1 i=l 
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as desired. 

What about general p;,..., 2m € [0,1]? We can be flip and say that, at least to a 
computer scientist, all real numbers have finite binary expansions. More rigorously, 
the function 


min max 
E1,++5€mE{0,1} AEA 


> (& ~ Pi) 


i€A 


f (Diss, Din) = 


is the finite minimum of finite maxima of continuous functions and thus is continuous. 
The set of (p1,...,Dm) € [0,1]™ with all p;27 € Z for some T is a dense subset of 
[0,1]. As f < K on this dense set, f < K for all (pi,...,pm) € [0,1]. | 


Corollary 13.3.3 Let A be a family of n sets on m points. Suppose disc(A|x) < K 
for every subset X with at most n points. Then disc(A) < 2K. 


Proof. For every X C 2 with |X| < n, herdisc(A|x) < K so by Theorem 13.3.2 
lindisc(A|x) < K. By Theorem 13.3.1 lindisc(A) < K. But 


disc(A) < 2lindisc(A) < 2K. 


| 
Corollary 13.3.4 For any family A of n sets of arbitrary size 
disc(A) < 12V/n. 
Proof. Apply Theorem 13.2.1 and Corollary 13.3.3. a 


13.4 LOWER BOUNDS 


We now give two quite different proofs that, up to a constant factor, Corollary 13.3.4 
is the best possible. A Hadamard matrix is a square matrix H = (h,;) with all 
hi; € {—1,+1} and with row vectors mutually orthogonal (and hence with column 
vectors mutually orthogonal). Let H be a Hadamard matrix of order n and let 
v = (v1,...,Un), ve € {-1, +1}. Then 


Av =v, +++: + ncn, 


where c; denotes the 7th column vector of H. Writing Hv = (Ly,..., Ln) and 
letting | - | denote the usual Euclidean norm, 


ee 
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since the c; are mutually orthogonal. Hence some L? > n and thus 
|Hvloo = max{|Li,...,|Lnl} > Vn. 


Now we transfer this result to one on families of sets. Let H be a Hadamard 
matrix of order n with first row and first column all ones. (Any Hadamard matrix 
can be so “normalized” by multiplying appropriate rows and columns by —1.) Let J 
denote the all ones matrix of order n. Let v,,...,21,... be as above. Then 


I,+---+L, = sos ij -Lo Leen 


i,g=l 


since the first column sums to n but the other columns, being orthogonal to it, sum 
to zero. Set \ = v1 + --- + up, so that Ju = (A,..., A) and 


(A + J)v = (£1 +4,..., Ln +A). 


We calculate 


(H+ J)ul? = SOL; +d)? = SO(L? + 2AL; +?) =n? £ 2ndA+ nd’. 


w=1 i=l 


Assume n is even. (Hadamard matrices don’t exist for odd n, except n = 1.) Then 
d is an even integer. The quadratic (in \) n? + 2nA + nA? has a minimum at $1 and 
so under the restriction of being an even integer its minimum is at \ = 0, +2 and so 


|(H + J)vj? =n’. 
Again, some coordinate must be at least \/n. Setting H* = (H + J)/2, 
|H* vl > Vn/2. 


Let A = {Aj,..., Am} be any family of subsets of 2 = {1,...,n} and let 
denote the corresponding m x n incidence matrix. A coloring x : 2 — {-1,+1} 
corresponds to a vector v = (x(1),...,x(m)) € {-1, +1}". Then 


disc(A, x) = |Mvloo 
and 


disc(A) = ja hs |Mvloo. 


In our case H™ has entries 0, 1. Thus we have the following theorem. 


Theorem 13.4.1 If a Hadamard matrix exists of order n > 1 then there exists a 
family A consisting of n subsets of an n-set with 


disc(A) > V/n/2. 
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While it is not known precisely for which n a Hadamard matrix exists [the 
Hadamard conjecture is that they exist for m = 1,2 and all multiples of 4; see, 
for example, Hall (1986)], it is known that the orders of Hadamard matrices are 
dense in the sense that for all € if n is sufficiently large there will exist a Hadamard 
matrix of order between n and n(1—e). This result suffices to extend Theorem 13.4.1 
to an asymptotic result on all n. 

Our second argument for the existence of A with high discrepancy involves turning 
the probabilistic argument “on its head.” Let M be a random 0, 1 matrix of order n. 
Let v = (v1,.--,Un), Uj; = £1 be fixed and set Mv = (Ly,..., L,). Suppose half 
of the v; = +1 and half are —1. Then 


Leg Gay 
22 2°2 


which has roughly the normal distribution N (0, \/n/2). Pick A > 0 so that 


[ a Lr eee 
=F =. 
_ , V2r 2 


Then 


Pr [|Li| < AVn/2] < - 


When v is imbalanced the same inequality holds; we omit the details. Now, crucially, 
the L; are mutually independent as each entry of M was independently chosen. Thus 


n 
Pr [|Li| < AVn/2 for alll <i <n] < (5) : 


There are “only” 2” possible v. Thus the expected number of v for which |Mv|o. < 
Ai/n/2 is less than 2"2~" = 1. For some M this value must be zero, there are no 
such v. The corresponding family A thus has 


disc(A) > AV/n/2. 


13.5 THE BECK-FIALA THEOREM 


For any family A let deg(.A) denote the maximal number of sets containing any 
particular point. The following result due to Beck and Fiala (1981) uses only methods 
from linear algebra and thus is technically outside the scope we have set for this book. 
We include it both for the sheer beauty of the proof and because the result itself is 
very much in the spirit of this chapter. 


Theorem 13.5.1 Let A be a finite family of finite sets, no restriction on either the 
number of sets or on the cardinality of the sets, with deg(A) < t. Then 


disc(A) < 2t-—1. 
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Proof. For convenience write A = {Aj,..., Am} with all A; CQ = {1,...,n}. 
To each 7 € {2 there is assigned a value x; that will change as the proof progresses. 
Initially all 2; = 0. At the end all x; = +1. We will have —1 < xz; < +1 at all 
times and once x; = +1 it “sticks” there and that becomes its final value. A set 
S; has value >) ,-5,2j. At any time j is called fixed if x; = +1; otherwise it is 
floating. A set S; is safe if it has fewer than t floating points; otherwise it is active. 
Note, crucially, that as points are in at most t sets and active sets contain more than t 
floating points there must be fewer active sets than floating points. 

We insist at all times that all active sets have value zero. This holds initially 
since all sets have value zero. Suppose this condition holds at some stage. Consider 
x; a variable for each floating 7 and a constant for each fixed 7. The condition 
that S; has value zero then becomes a linear equation in these variables. This is an 
underdetermined system: there are fewer linear conditions (active sets) than variables 
(floating points). Hence we may find a line, parametrized 


x’, = 2; + Ay;, J floating, 


on which the active sets retain value zero. Let » be the smallest value for which 
some x’, becomes +1 and replace each x; by r%,. (Geometrically, follow the line 
until reaching the boundary of the cube in the space over the floating variables.) This 
process has left fixed variables fixed and so safe sets stayed safe sets (though active 
sets may have become safe) and so the condition still holds. In addition, at least one 
previously floating 7 has become fixed. 

We iterate the above procedure until all 7 have become fixed. (Toward the end 
we may have no active sets at which time we may simply set the floating x; to +1 
arbitrarily.) Now consider any set S;. Initially it had value zero and it retained value 
zero while it contained at least ¢ floating points. Consider the time when it first 
becomes safe, say, 1,...,/ were its floating points. At this moment its value is zero. 
The variables y;,...,% can now change less than two to their final value since all 
values are in [—1, +1]. Thus, in total, they may change less than 2t. Hence the final 
value of S; is less than 2¢ and, as it is an integer, it is at most 2¢ — 1. a 


Conjecture 13.5.2 If deg(A) < t then disc(A) < K Vt, K an absolute constant. 


This conjecture seems to call for a melding of probabilistic methods and linear 
algebra. The constructions of ¢ sets on ¢ points, described in Section 13.4, show that, 
if true, this conjecture would be the best possible. 


13.6 EXERCISES 


1. Let A be a family of n subsets of 2 = {1,...,m} with m even. Let x(Z), 
1 <i < m/2 be independent and uniform in {—1, +1} and set y(t + m/2) = 
—y(i) for 1 < i < m/2. Using this notion of random coloring, improve 
Theorem 13.1.1 by showing disc(A) < \/mIn(2n). Show that this can be 
improved even further by splitting 2 randomly into m/2 disjoint pairs. 
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2. Lett,...,0, € R”. Letay,...,25 € [—1, +1] such that )0}_, ait; = 0 and 
such that x; € {—1,+1} for all but at most n values of 7. Let U4, € R”. 
Use the linear ideas of Section 13.5 to find z{,..., 24, 24, with the following 
properties: 

Dini 248 = 0. 

All! € [-1, +1]. 


e x) € {—1,+1} for all but at most n values of 2. 


e x! = x; whenever 27; € {-1, +1}. 


Use the above to prove the following result of Barany and Grinberg: Let | - | 
be an arbitrary norm in R”. Let v1,...,¢, € R” with all |v;! < 1. Then there 
exist 21,...,@  € {—1, +1} such that 


g 
) L4V; 
i=1 


< 2n 


foralll <t<s. 


3. Let Aj,...,An C Q = {1,...,m} with m ~ nlnn. Assume further that 
all |A;| < n. Use the methods of Theorem 13.2.1, including Kleitman’s 
Theorem, to prove that there exists x : {1,...,m} — {—1,0,+1} such that 
all x(A;) = O(VninInn) and x(x) = 0 for at most n vertices x. Use 
Theorem 13.2.1 to deduce the existence of y : {1,...,m}— {—1,+1} such 


that all y(A;) = O(VnInInn). 


THE PROBABILISTIC LENS: 
Unbalancing Lights 


For any m x n matrix B = (b;;) with coefficients b;; = +1 set 


F([B| = seca Sie ij 


i=1 j=1 


As in Section 2.5 we may interpret B as an m x n array of lights, each either on 
(bj; = +1) or off (b,; = —1). For each row and each column there is a switch 
that, when pulled, changes all lights in that line from on to off or from off to on. 
Then FB] gives the maximal achievable number of lights on minus lights off. In 
Section 2.5 we found a lower bound for F[B] when m = n. Here we set n = 2” 
and find the precise best possible lower bound. 

With n = 2” let A be an m x n matrix with coefficients +1 containing every 
possible column vector precisely once. We claim FA] is the minimal value of F'[B] 
over all m x n matrices B. 

For any given B let 21,...,%, = +1 be independently and uniformly chosen 
and set 


xX; = S> axibiy, 

i=1 
X = |Xil+---+[Xnl, 
so that 


F[B] = Ru max, DX = = max 5 DK |= maxX. 


234 
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Regardless of the b;;, X; has distribution S,, so that E[|X;|] = E[|S,,|| and, by 


linearity of expectation, 
E[X] = nE[|Sim] . 


With B = A, any choices of 71,...,%m = +1 have the effect of permuting the 
columns — the matrix (2;a;;) also has every column vector precisely once — so 
that X = |X| +---+|X,,| is a constant. Note that E [X] is independent of B. In 
general, fixing E[X] = yz, the minimal possible value for max X is achieved when 
X is the constant y. Thus F[B] is minimized with B = A. 


This Page Intentionally Left Blank 
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Geometry 


Few people think more than two or three times a year. I have made an interna- 
tional reputation for myself by thinking once or twice a week. 


— George Bernard Shaw 


Suppose we choose randomly n points P;,..., P,, on the unit circle, according to 
a uniform distribution. What is the probability that the origin lies in the convex 
hull of these points? There is a surprisingly simple (yet clever) way to compute this 
probability. Let us first choose n random pairs of antipodal points Q), Qn+41 = —Q1, 
Q2,Qn42 = —Qa,.--, Qn, Qan = —Q,, according to a uniform distribution. Note 
that with probability 1 these pairs are all distinct. Next we choose each P; to 
be either Q; or its antipodal Q,4; = —Qj, where each choice is equally likely. 
Clearly this corresponds to a random choice of the points P;. The probability that 
the origin does not belong to the convex hull of the points P;, given the (distinct) 
points Q,, is precisely x/2”, where x is the number of subsets of the points Q; 
contained in an open half-plane determined by a line through the origin, which 
does not pass through any of the points Q;. It is easy to see that z = 2n. This 
is because if we renumber the points Q; so that their cyclic order on the circle is 
Q1,---,; Qn; Qn41;---;Qan and Qn4+i = —Q, then the subsets contained in such 
half-planes are precisely {Q;,...,@Qn+i—1}, where the indices are reduced modulo 
2n. Therefore the probability that the origin is in the convex hull of n randomly 
chosen points on the unit circle is precisely 1 — 2n/2”. Observe that the same result 
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holds if we replace the unit circle by any centrally symmetric bounded planar domain 
with center 0 and that the argument can easily be generalized to higher dimensions. 

This result, due to Wendel (1962), shows how in some cases a clever idea can 
replace a tedious computation. It also demonstrates the connection between proba- 
bility and geometry. The probabilistic method has recently been used extensively for 
deriving results in discrete and computational geometry. Some of these results are 
described in this chapter. 


14.1. THE GREATEST ANGLE AMONG POINTS IN 
EUCLIDEAN SPACES 


There are several striking examples, in different areas of combinatorics, where the 
probabilistic method supplies very simple counter examples to long standing conjec- 
tures. Here is an example, due to Erdés and Fiiredi (1983). 


Theorem 14.1.1 For every d > 1 there is a set of at least | 4(2/v/3)7| points in the 
d-dimensional Euclidean space R+, such that all angles determined by three points 
from the set are strictly less than 7/2. 


This theorem disproves an old conjecture of Danzer and Griinbaum (1962) that 
the maximum cardinality of such a set is at most 2d — 1. We note that as proved by 
Danzer and Griinbaum the maximum cardinality of a set of points in R@ in which all 
angles are at most 7/2 is 2°. 


Proof. We select the points of a set X ¢ R¢ from the vertices of the d-dimensional 
cube. As usual, we view the vertices of the cube, which are 0, 1-vectors of length 
d, as the characteristic vectors of subsets of a d-element set; that is, each 0, 1-vector 
a of length d is associated with the set A = {1:1 <i < d,a; = 1}. A simple 
consequence of Pythagoras’ Theorem gives that the three vertices a, b and c of the 
d-cube, corresponding to the sets A, B and C, respectively, determine a right angle 
at c if and only if 


ANBCCCAUB. (14.1) 


As the angles determined by triples of points of the d-cube are always at most 7/2, 
it suffices to construct a set X of cardinality at least the one stated in the theorem, no 
three distinct members of which satisfy (14.1). 

Define m = |4(2//3)4|, and choose, randomly and independently, 2m d- 
dimensional {0, 1}-vectors a1, ..., @2m, where each coordinate of each of the vectors 
independently is chosen to be either 0 or 1 with equal probability. For every fixed 
triple a, b and c of the chosen points, the probability that the corresponding sets 
satisfy equation (14.1) is precisely (3/4)¢. This is because (14.1) simply means that 
for each i, 1 <2 < d, neither a; = 6; = 0,c; = 1 nor a; = 6; = 1,c¢; = 0 hold. 
Therefore the probability that, for three fixed indices i, j and k, our chosen points a,, 
aj, a, form a right angle at a, is (3/4)*. Since there are (74")3 possible triples that 
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can produce such angles, the expected number of right angles is 


) 3/4) <m, 


where the last inequality follows from the choice of m. Thus there is a choice of a set 
X of 2m points in which the number of right angles is at most m. By deleting one 
point from each such angle we obtain a set of at least 2m — m = m points in which 
all angles are strictly less than 7/2. Note that the remaining points are all distinct 
since (14.1) is trivially satisfied if A = C’. This completes the proof. a 


It is worth noting that, as observed by Erdés and Fiiredi, the proof above can be 
easily modified to give the following. 


Theorem 14.1.2 For every € > 0 there is a 6 > 0 such that for every d > 1 there 
is a set of at least (1 + 6)? points in R¢ so that all the angles determined by three 
distinct points from the set are at most 7/3 + €. 


We omit the detailed proof of this result. 


14.2 EMPTY TRIANGLES DETERMINED BY POINTS IN THE PLANE 


For a finite set X of points in general position in the plane, let f(X) denote the 
number of empty triangles determined by triples of points of X; that is, the number 
of triangles, determined by points of X, that contain no other point of X. Katchalski 
and Meir (1988) studied the minimum possible value of f(X) for a set X of n points. 
Define f(n) = min f(X), where X ranges over all planar sets of n points in general 
position (i.e., containing no three colinear points). They proved that 


(" 7 ') < f(n) < 200n?. 


These bounds were improved by Barany and Fiiredi (1987), who showed that as n 
grows 
(1+ o(1))n? < f(n) < (1+ 0(1))2n?. 


The construction that establishes the upper bound is probabilistic and is given in the 
following theorem. See also Valtr (1995) for a slightly better result. 


Theorem 14.2.1 Let [,, [o,..., I, be parallel unit intervals in the plane, where 
LS{taer a1 = 951}. 


For each i let us choose a point p; randomly and independently from I; according to a 
uniform distribution. Let X be the set consisting of these n randomly chosen points. 
Then the expected number of empty triangles in X is at most 2n? + O(n log n). 
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Clearly, with probability 1, X is a set of points in general position and hence the 
above theorem shows that f(n) < 2n? + O(nlogn). 


Proof. We first estimate the probability that the triangle determined by the points 
Diy Dita and pj4% is empty, for some fixed i,a andk = a+b > 3. Let A = (i, 2), 
B= (i+a,y) andC = (i+k, z) be the points p;, p44 and p:+%, respectively. Let 
m be the distance between B and the intersection point of the segment AC’ with the 
interval I;,,. Since each of the points p; fori < 7 < 7+ k are chosen randomly 
according to a uniform distribution on J;, it follows that the probability that the 
triangle determined by A, B and C is empty is precisely 


A 
Oo 
fal 

ao) 
| 
| 
| 
T 
| 
eS 
| 
ie 
— 
| 
oe 
] 
— 
| 
| 


- 9 (-(3)2-()8)-e(-0-2) 


For every fixed choice of A and C, when the point p;,, = B is chosen randomly, 
the probability that its distance m from the intersection of the segment AC with the 
interval J; , is at most dis clearly at most 2d, for all d > 0. Therefore the probability 
that the triangle determined by p,, pi; and p;+; is empty is at most 


& 4 
2 —3(k-2)m gry — . 
/ e m= ko2 


It follows that the expected value of the total number of empty triangles is at most 


n-3n-ik-1 


a ea ; 


i=1 k=3a=1 


= n-a4 St k)(k yo 


= n-24 Sle atte k) 


= 2n? woe 


This completes the proof. a 


The result above can be extended to higher dimensions by applying a similar 
probabilistic construction. A set X of n points in the d-dimensional Euclidean 
space is called independent if no d + 1 of the points lie on a hyperplane. A simplex 
determined by d+ 1 of the points is called empty if it contains no other point of X. Let 
fa(X) denote the number of empty simplices of X, and define fg(n) = min fa(X), 
where X ranges over all independent sets of n points in R?. Katchalski and Meir 
(1988) showed that fy(n) > ("3"). The following theorem of Bardny and Fiiredi 
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shows that here, again, a probabilistic construction gives a matching upper bound, up 
to a constant factor (that depends on the dimension). We omit the detailed proof. 


Theorem 14.2.2 There exists a constant K = K(d), such that for every convex, 
bounded set A C R¢@ with nonempty interior, if X is a random set of n points 
obtained by n random and independent choices of points of A picked with uniform 
distribution, then the expected number of empty simplices of X is at most K(")). 


14.3 GEOMETRICAL REALIZATIONS OF SIGN MATRICES 


Let A = (a;,;) be an m by n matrix with +1 entries. We say that A is realizable 
in R¢ if there are m hyperplanes H,,..., H,, in R® passing through the origin and 
n points P,,...,P, in IR, so that for all 7 and j, P; lies on the positive side of H; 
if a;,; = +1, and on the negative side if a; ; = —1. Let d(A) denote the minimum 
dimension d such that A is realizable in IR¢, and define d(m,n) = max d(A), where 
A ranges over all m by n matrices with +1 entries. Since d(m,n) = d(n,m) we 
can consider only the case m > n. 

The problem of determining or estimating d(m, n) and, in particular, d(n, n) was 
raised by Paturi and Simon (1984). This problem was motivated by an attempt 
to estimate the maximum possible “unbounded-error probabilistic communication 
complexity” of Boolean functions. Alon, Frankl and Rédl (1985) proved that as n 
grows 35n < d(n,n) < ($ + o(1))n. Both the upper and the lower bounds are 
proved by combining probabilistic arguments with certain other ideas. In the next 
theorem we prove the upper bound, which is probably closer to the truth. 


Theorem 14.3.1 For allm > n, 


m+ 1 n-1 
d < — 
(mn) < 9 + fo 


logm. 


For the proof, we need a definition and two lemmas. For a vector @ = (a1,...,@n) 
of +1 entries, the number of sign changes in @ is the number of indicesi, 1 <i <n-—1 
such that a; = —a;41. For a matrix A of +1 entries, denote by s(A) the maximum 
number of sign changes in a row of A. 


Lemma 14.3.2 For any matrix A of +1 entries, d(A) < s(A) +1. 


Proof. Let A = (a;,;) be an m by n matrix of +1 entries and suppose s = s(A). 
Let t) < to < +--+ < t, be arbitrary reals, and define n points P|, P2,..., Pp in 
R**? by B= (lta #3, ...,t%). These points, whose last s coordinates represent 
points on the d-dimensional moment curve, will be the points used in the realization 
of A. To complete the proof we have to show that each row of A can be realized by a 
suitable hyperplane through the origin. This is proved by applying some of the known 
properties of the moment curve as follows. Consider the sign vector representing 
an arbitrary row of A. Suppose this vector has r sign changes, where, of course, 
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r <8. Suppose the sign changes in this vector occur between the coordinates 7; and 
i; +1, for1 < 7 <r. Choose arbitrary reals y;,...,y-, where tj, < yj < tij+i 
for 1 < 7 <r. Consider the polynomial P(t) = []/_,(¢ — y;). Since its degree 
is at most s there are real numbers a; such that P(t) = )05_, a;t?. Let H be the 
hyperplane in R°*? defined by H = {(x0,#1,..-,25) € R°t! : )_9 ajay = O}. 
Clearly the point P; = (1,t;,...,¢3) is on the positive side of this hyperplane if 
P(t;) > 0, and is on its negative side if P(t;) < 0. Since the polynomial P changes 
sign only in the values y;, it follows that the hyperplane H separates the points 


P,,...,P, according to the sign pattern of the corresponding row of A. Hence, 
by choosing the orientation of H appropriately, we conclude that A is realizable in 
R‘*!, completing the proof of the lemma. a 


Lemma 14.3.3 For every m by n matrix A of £1 entries there is a matrix B obtained 
from A by multiplying some of the columns of A by —1, such that 


n—-1 n-1 
B)< 1 : 
s(B) < 5 +4 5 logm 


Proof. For each column of A, randomly and independently, choose a number 
€ € {+1}, where each of the two choices is equally likely, and multiply this column 
by «. Let B be the random sign matrix obtained in this way. Consider an arbitrary 
fixed row of B. One can easily check that the random variable describing the number 
of sign changes in this row is a binomial random variable with parameters n — 1 and 
p = 1/2. This is because no matter what the entries of A in this row are, the row 
of B is a totally random row of +1 entries. By the standard estimates for binomial 
distributions, described in Appendix A, the probability that this number is greater 


than 3(n — 1) + \/ a(n — 1) logm is smaller than 1/m. Therefore with positive 


probability the number of sign changes in each of the m rows is at most that large, 
completing the proof. a 


Proof [Theorem 14.3.1]. Let A be an arbitrary m by n matrix of +1 entries. 
By Lemma 14.3.3 there is a matrix B obtained from A by replacing some of its 


+(n — 1) log m. Observe 


columns by their inverses, such that s(B) < $(n—1)4 


that d(A) = d(B), since any realization of one of these matrices by points and 
hyperplanes through the origin gives a realization of the other one by replacing the 
points corresponding to the altered columns by their antipodal points. Therefore, by 
Lemma 14.3.2, 


Reed gs n-1 
2 2 


d(A) = d(B)<s(B)+1< logm. 


This completes the proof. a 
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It is worth noting that by applying the (general) six standard deviations theorem 
stated at the end of Section 13.2, the estimate in Lemma 14.3.3 (and hence in 


Theorem 14.3.1) can be improved to $n +O ( n log(m/n)). It can also be shown 


that if n and m grow so that m/n? tends to infinity and (log, m)/n tends to 0 then 
for almost all m by n matrices A of +1 entries d(A) = (4 + 0(1))n. 


14.4 «-NETS AND VC-DIMENSIONS OF RANGE SPACES 


What is the minimum number f = f(n,¢€) such that every set X of n points in the 
plane contains a subset S of at most f points such that every triangle containing at 
least en points of X contains at least one point of S? As we shall see in this section, 
there is an absolute constant c such that f(n,€) < (c/e) log(1/e) and this estimate 
holds for every n. This somewhat surprising result is a very special case of a general 
theorem of Vapnik and Chervonenkis (1971), which has been extended by Haussler 
and Welz! (1987), and which has many interesting applications in computational 
geometry and in statistics. 

In order to describe this result we need a few definitions. A range space S is a 
pair (X, R), where X is a (finite or infinite) set and RF is a (finite or infinite) family 
of subsets of X. The members of X are called points and those of R are called 
ranges. If A is a subset of X then Pp(A) = {r 1 A: r © R} is the projection of 
Ron A. Incase this projection contains all subsets of A we say that A is shattered. 
The Vapnik—Chervonenkis dimension (or VC-dimension) of S, denoted by VC(S), 
is the maximum cardinality of a shattered subset of X. If there are arbitrarily large 
shattered subsets then VC(S'}) = oo. 

The number of ranges in any finite range space with a given number of points 
and a given VC-dimension cannot be too large. For integers n > 0 and d > 0, 


define a function g(d,n) by g(d,n) = se (") . Observe that for all n,d > 1, 


g(d,n) = g(d,n —1)+ g(d—1,n—1). The following combinatorial lemma was 
proved, independently, by Sauer (1972), by Perles and Shelah and, in a slightly 


weaker form, by Vapnik and Chervonenkis. 
Lemma 14.4.1 /f (X, R) is a range space of VC-dimension d with |X| = n points 
then |R| < g(d,n). 


Proof. We apply induction on n + d. The assertion is trivially true for d = 0 and 
n = 0. Assuming it holds for n and d — 1 and for n — 1 and d — 1 we prove it for 
n and d. Let S = (X, R) be a range space of VC-dimension d on n points. Suppose 
x € X, and consider the two range spaces S — x and S \ x defined as follows. 


S-x= (X-{z},R-2z), whreR-z ={r\{z}:reR}, 
S\e= (X—{z},R\2), whereR\c =freR:a€gr,ru{z} eR}. 


Clearly the VC-dimension of S — x is at most d. It is also easy to see that the 
VC-dimension of S \ x is at most d — 1. Therefore, by the induction hypothesis, 


|R| = |R—a2|+|R\ a] < g(d,n—1)+g9(d—-1,n~1) =g(d,n), 


244 GEOMETRY 


completing the proof. | 


It is easy to check that the estimate given in the above lemma is sharp for all 
possible values of n and d. If (X,R) is a range space of VC-dimension d and 
A Cc X, then the VC-dimension of (A, Pr(A)) is clearly at most d. Therefore the 
last lemma implies the following. 


Corollary 14.4.2 If (X, R) is a range space of VC-dimension d then for every finite 
subset A of X, |Pr(A)| < g(d, |Al). 


There are many range spaces with finite VC-dimension that arise naturally in 
discrete and computational geometry. One such example is the space S = (R?, H), 
whose points are all the points in the d-dimensional Euclidean space, and whose set 
of ranges is the set of all (open) half-spaces. Any set of d+ 1 affinely independent 
points is shattered in this space, and, by Radon’s Theorem, no set of d+ 2 points is 
shattered. Therefore VC(S) = d+1. As shown by Dudley (1978), if (X, R) has finite 
VC-dimension, so does (X, R;,), where R,, is the set of all Boolean combinations 
formed from at most & ranges in R. In particular, the following statement is a simple 
consequence of Corollary 14.4.2. 


Corollary 14.4.3 Let (X,R) be a range space of VC-dimension d > 2 and let 
(X, Rp) be the range space on X in which Rp, = {(r10-+--A ra): 71,---,Tr € R}. 
Then VC(X, Rp) < 2dh log(dh). 


Proof. Let A be an arbitrary subset of cardinality n of X. By Corollary 14.4.2 
|Pr(A)| < g(d,n) < n*. Since each member of Pr, (A) is an intersection of 
h members of Pp(A), it follows that |Pr,(A)| < (%%™) <n. Therefore, if 
n@h < 2", then A cannot be shattered. But this inequality holds forn > 2dh log(dh), 
since dh > 4. | 


As shown above, the range space whose set of points is R¢ and whose set of ranges 
is the set of all half-spaces has VC-dimension d+ 1. This and the last corollary imply 
that the range space (R“%, C;,), where C;, is the set of all convex d-polytopes with h 
facets, has a VC-dimension that does not exceed 2(d + 1)hlog((d + 1)h). 

An interesting property of range spaces with a finite VC-dimension is the fact that 
each finite subset of such a set contains relatively small good samples in the sense 
described below. Let (X, R) be a range space and let A be a finite subset of X. For 
0<e<1,asubset B C Ais an e-sample for A if for any range r € R the inequality 


|ANr| [Bel : 
|Al |B] | 7 


holds. Similarly, a subset NV C A is an e-net for A if any range r € R satisfying 
|r 1 Al > e|A| contains at least one point of NV. 

Note that every e-sample for A is also an €-net and that the converse is not true. 
However, both notions define subsets of A that represent approximately some of the 
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behavior of A with respect to the ranges. Our objective is to show the existence 
of small ¢-nets or e-samples for finite sets in some range spaces. Observe that if 
(X, R) is a range space with an infinite VC-dimension then for every n there is a 
shattered subset A of X of cardinality n. It is obvious that any ¢-net (and hence 
certainly any e-sample) for such an A must contain at least (1 — €)n points; that is, 
it must contain almost all points of A. Therefore in infinite VC-dimension there are 
no small nets or samples. However, it turns out that in finite VC-dimension there are 
always very small nets and samples. The following theorem was proved by Vapnik 
and Chervonenkis (1971). 


Theorem 14.4.4 There is a positive constant c such that if (X, R) is any range space 


of VC-dimension at most d, A C X is a finite subset and €,6 > 0, then a random 
subset B of cardinality s of A, where s is at least the minimum between |A| and 


5 (dog © +-1og 5) s 
is an e-sample for A with probability at least 1 — 6. 
Using similar ideas, Haussler and Weiz| (1987) proved the following theorem. 
Theorem 14.4.5 Let (X,R) be a range space of VC-dimension d, let A be a finite 


subset of X and suppose 0 < €,6 < 1. Let N bea set obtained by m random 
independent draws from A, where 


es ot ‘ (14.2) 


4 
m> max { log =, — log 
€ 6 € € 


Then N is an €-net for A with probability at least 1 — 6. 


Therefore, if A is a finite subset of a range space of finite VC-dimension d, then 
for any € > 0, A contains e-nets as well as €-samples whose size is at most some 
function of € and d, independent of the cardinality of A! The result about the triangles 
mentioned in the first paragraph of this section thus follows from Theorem 14.4.5, 
together with the observation following Corollary 14.4.3 that implies that the range 
space whose ranges are all triangles in the plane has a finite VC-dimension. We note 
that, as shown by Pach and Woeginger (1990), there are cases in which, for fixed 6, 
the dependence of m in 1/¢ cannot be linear, but there is no known natural geometric 
example demonstrating this phenomenon. See also Komlds, Pach and Woeginger 
(1992) for a tight form of the last theorem. 

The proofs of Theorems 14.4.4 and 14.4.5 are very similar. Since the computation 
in the proof of Theorem 14.4.5 is simpler, we describe here only the proof of this 
theorem and encourage the reader to try and make the required modifications that 
yield a proof for Theorem 14.4.4. 


Proof [Theorem 14.4.5]. Let (X, R) be a range space with VC-dimension d and 
let A be a subset of X of cardinality |A] = n. Suppose m satisfies (14.2), and let 
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N = (21,...,£m) be obtained by m independent random choices of elements of A. 
(The elements in N are not necessarily distinct, of course.) Let E, be the following 
event: 

Ey = {2r ER: |rn Al >en,rnN =9}. 


To complete the proof we must show that the probability of EF, is at most 6. To 
this end, we make an additional random choice and define another event as follows. 
Independently of our previous choice, we let T = (y1,...,Ym) be obtained by m 
independent random choices of elements of A. Let EH be the event defined by 


Ey = {3r € Ri |r Al > enr AN =O,[rnT/ > S}. 


[Since the elements of T are not necessarily distinct, the notation |r M T| means here 
l{i:1<%< m,y; € r}|. The quantities |r N| and |r M (N U T)| are similarly 
defined. ] 

Claim 14.4.6 Pr [E>| > 


3Pr [E4]. 


Proof. It suffices to prove that the conditional probability Pr [E2 | E,] is at least 3. 
Suppose that the event / occurs. Then there is anr € R such that |r M Al > en and 
rN = 9. The conditional probability above is clearly at least the probability that 
for this specific r, |r MT] > em. However, |r 1 T| is a binomial random variable 
with expectation pm and variance (1 — p)pm < pm, where p = |r Al/|A| > e. 
Hence, by Chebyshev’s Inequality, 
Pr [rn z| < | <Prlrnz| <=] je oe 
ye aed 2 pm / 2) em 22 

where the last inequality follows from (14.2). Thus the assertion of Claim 14.4.6 is 
correct. a 


Claim 14.4.7 Pr [Eo] < g(d, 2m)Q-<m/2 : 


Proof. The random choice of N and T can be described in the following way, which 
is equivalent to the previous one. First one chooses N UT = (21,..., 22m) by 
making 2m random independent choices of elements of A, and then one chooses 
randomly precisely m of the elements z; to be the set NV (the remaining elements z; 
form the set T, of course). For each range r € R satisfying |r M A| > en, let E, be 
the event that |r 1 T| > Sem and rN = Q. A crucial fact is that if r,r’ € R are 
two ranges, |r M Al > en and |r’ Al > en andifrN (NUT) =r'N(NUT), 
then the two events FE. and E,:, when both are conditioned on the choice of N UT, 
are identical. This is because the occurrence of E, depends only on the intersection 
rM(N UT). Therefore, for any fixed choice of N U T, the number of distinct 
events E,. does not exceed the number of different sets in the projection Pyur(R). 
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Since the VC-dimension of X is d, Corollary 14.4.2 implies that this number does 
not exceed g(d, 2m). 

Let us now estimate the probability of a fixed event of the form EF, given the 
choice of N UT. This probability is at most 


Pr [raN=0|Irn(vuT)|> > 


Define s = |r M(N UT)|. Since the choice of N among the elements of N U T is 
independent of the choice of NV U 7, the last conditional probability is precisely 


(2m — s)(2m —s—1)---(m—s8s+1) 
2m(2m — 1)---(m+1) 
7 m(m —1)---(m—s+1) i ahs 
7 OMI) Oma sai se a 


Since there are at most g(d, 2m) potential distinct events E,, it follows that the 
probability that at least one of them occurs given the choice of N U T is at most 
g(d, 2m)2-*"/?,, Since this estimate holds conditioned on every possible choice of 
NUT it follows that the probability of the event F’ is at most g(d,2m)2-°"/?. This 
establishes Claim 14.4.7 a 


By Claims 14.4.6 and 14.4.7, Pr [E,] < 2g(d, 2m)2-©”/?. To complete the proof 
of the theorem it remains to show that if m satisfies inequality (14.2) then 


29(d,2m)2-°"/? < 6. 


We describe the proof for d > 2. The computation for d = 1 is easier. Since 
g(d, 2m) < (2m)? it suffices to show that 2(2m)4 < 62°/?; that is, 


> > dlog(2m) + log ; ; 


From (14.2) it follows that 
eS gee 
4 — ig 
and hence it suffices to show that Sem > dlog(2m). 
The validity of the last inequality for some value of m implies its validity for any 
larger m and hence it suffices to check that it is satisfied for m = (8d/e) log(8d/e); 


that is, 


Sdilog 22 > dlog (= log : 
€ € € 


The last inequality is equivalent to 4d/e > log(8d/e), which is certainly true. This 
completes the proof of the theorem. es) 


Theorems 14.4.4 and 14.4.5 have been used for constructing efficient data struc- 
tures for various problems in computational geometry. A trivial example is just the 
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observation that Theorem 14.4.4 implies the following: For every « > 0 there is a 
constant c = c{e) such that for every n and every set A of n points in the plane there 
is a data structure of size c(€) that enables us to estimate, given any triangle in the 
plane, the number of points of A in this triangle up to an additive error of en. This is 
done simply by storing the coordinates of a set of points that form an e-sample for A 
considered as a subset of the range space whose ranges are all planar triangles. More 
sophisticated data structures whose construction relies on the above two theorems 
can be found in the paper of Haussler and Welz] (1987). 


14.5 DUAL SHATTER FUNCTIONS AND DISCREPANCY 


The dual shatter function h of a range space S = (X, R) is the function h mapping 
integers to integers, defined by letting h(g) denote the maximum, over all possible 
choices of g members of R, of the number of atoms in the Venn diagram of these 
members. It is not too difficult to prove that if the VC-dimension of S is d, then 
h(g) < O(g?"’'—), but in geometric applications it is usually better to bound this 
function directly. 

In MatouSek, Welzl and Wernisch (1993) it is proved that if the dual shatter 
function of a range space S = (X, R) satisfies h(g) < O(g’), A is any set of n points 
in the range space and F is the projection Pr(A) of R on A, then the discrepancy of 
F satisfies 

disc(F) < O(n¥/2-1/2#. log n). (14.3) 


This supplies nontrivial estimates in various geometric situations, improving the 
trivial bound that follows from Theorem 13.1.1 of Chapter 13. In most of these 
geometric applications it is widely believed that the /log n factor can be omitted. In 
the abstract setting, however, this factor cannot be omitted, as proved for ¢ = 2, 3 in 
MatouSek (1997) and later for all ¢ in Alon, Rényai and Szabé (1999). 

The proof of (14.3) is based on a beautiful result of Chazelle and Welz! (1989) 
and its improvement by Haussler (1995). It is somewhat simpler to prove the result 
with an extra logarithmic factor, and this is the proof we present here. See Pach and 
Agarwal (1995) for some additional information. 

Let F be a family of subsets of a finite set A. In what follows we consider graphs 
whose edges are (unordered) pairs of points of A. For F € F and x,y € A, the 
edge xy stabs F if F contains exactly one of the two points x and y. The following 
theorem is proved in Chazelle and Welzl (1989). An improvement by a logarithmic 
factor appears in Haussler (1995). 


Theorem 14.5.1 Let (A, F) be a finite range space, where |A| = n, and suppose 
that its dual shatter function h satisfies h(g) < cg‘ for some fixed c,t > 0. Then, 
there is a C' = C(c,t) and a Hamiltonian path on A, such that each member F of F 
is stabbed by at most Cn'~\/* log n edges of the path. 


To prove the above theorem, we need the following lemma. 
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Lemma 14.5.2 Ler (A, F),n,h,t and c be as above, let B be a finite subset of p > 1 
points of A, and let G be a collection of m (not necessarily distinct) members of 
F. Then there are two distinct points x,y in B, such that the edge xy stabs at most 
(bm log p)/p'/t members of G, where b = b(c). 


Proof. We may and will assume that p is larger than c-+ 1. Let g be the largest integer 
such that cg’ < p—1, thatis, g = |((p — 1)/c)'/*|. Let L be arandom collection of g 
members of G, each picked, randomly and independently (with possible repetitions), 
among all m members of G with uniform distribution. The Venn diagram of all 
members of L partition B into at most h(g) < cg < p atoms and hence there are 
two distinct points x, y of B that lie in the same atom. 

To complete the proof it suffices to show that with positive probability, for each 
pair of points of B that stabs more than (bm log p)/p'/t members of G, at least 
one of these members lies in L (and hence the pair does not lie in an atom of the 
corresponding Venn diagram.) There are (5) such pairs and for each of them the 
probability that Z contains no member of G it stabs is at most 


, — diese eee blogp | f/p—-—1 a 
pit ec pit i , 


which is Jess than 1 /p” for an appropriately chosen constant b = b(c). This completes 
the proof. | 


Proof [Theorem 14.5.1]. Note first that if d is the VC-dimension of the given 
space, then there is a shattered set D of size d. It is not difficult to see that there 
are g = [log, d| sets among those shattering D, so that no two points of D lie in 
the same atom of their Venn diagram. Therefore d < c({log, d])*, implying that 
d < 2°tlest where c! = c/(c). By Lemma 14.4.1 this implies that the total number 


. . e’tlogt 
of ranges in R is at most n? : 


We next prove that there is a spanning tree of A satisfying the assertion of Theo- 
rem 14.5.1, and then show how to replace it by a Hamiltonian path. By Lemma 14.5.2 


with By = A, pop = nand Go = F,mo = |Gol (< noe 18). we conclude that there 
is a pair Xo, yo Of points in A such that the edge zoyo does not stab more than 
mo(blogn)/n+/* members of G. Let G, be the collection obtained from G by dupli- 
cating all members of G that are stabbed by zoyo, and define B} = B—zxo,p, = n-—1, 
m1 = |Gil < mo(1+ (blog n)/n4/"]. Applying Lemma 14.5.2 again, this time to By 
and G,, we obtain another pair x1, y1, define By = By ~x1, p2 = p; —1 = n—2, and 
let G2 be the collection obtained from G by duplicating all members of G, stabbed by 
L1Y1,™M2 = |Go|. By the assertion of the lemma, mz < m (1+ (blogn/(n—1)!/*)}. 
Proceeding in this manner we get a sequence roYo, 71Y1,---;%n—1Yn—1 Of edges 
of a graph on A, a sequence of subsets Bp = A, Bj,...,By_-i, where each B; 
is obtained from the previous one by omitting the point z;_;, and a sequence of 
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collections Gp, G1,...,Gn—1, where 
blogn 
Gri] << mo Tl (14 een) en | 
< 


n-1 
c't lo. Si é —1/ 
n? es" exp (bign Sia —a 2) < Qn’ ' “log ne 


7=0 


for an appropriate b’ = b'(c, t). 

Note, now, that the edges x;y; form a spanning tree on the set A. The crucial 
observation is the fact that if a member of F is stabbed by s of the edges, then it is 
being duplicated s times during the above process that generates G,, 1, implying that 
2° < |Gn_1| and hence that s < b’n'—'/*logn. It remains to replace the spanning 
tree by a Hamiltonian path. To do so, replace each edge of the tree by two parallel 
edges and take a Euler tour in the resulting graph (in which all degrees are even). This 
is a sequence £0, 21, L2,---,;£2n—2 = Lo Of points of A such that each adjacent pair 
of elements of the sequence is an edge of the tree, and each edge appears twice this 
way. The subsequence of the above one obtained by keeping only the first appearance 
of each point of A is a Hamiltonian path; it is easy to check that each member of F 
is stabbed by at most 2b/n!~!/* log n of its edges, completing the proof. a 


The following result is a simple consequence of Theorem 14.5.1. As mentioned 
above, its assertion can be improved by a factor of log n. 


Theorem 14.5.3 Let (A,F) be a finite range space, where |A| = n, and suppose 
that its dual shatter function h satisfies h(g) < cg‘ for some fixed c,t > 0. Then, 
there is a C’ = C'(c,t) such that the discrepancy of F satisfies 


dise(F) < C’n/?-1/24 og n. 


Proof. Without loss of generality, assume that the number of points of A is even 
(otherwise, simply omit a point), By Theorem 14.5.1 there is a Hamiltonian path 
%1X%2-*-X, On these points such that each member of F is stabbed by at most 
Cn'—1/* log n edges of the path. Let f : A > {—1,1} be a random coloring of 
A, where for each i, 1 < i < n/2, randomly and independently, either f(zo;-1) = 
1, f(%a:) = —lor f(xai-1) = —1, f (xi) = 1, the two choices being equally likely. 
Fix a member F' € Ff, and note that the contribution of each pair 72;_12%2; to the 
sum ajEP f (xj) is zero, if the edge r2;_122; does not stab F, and is either +1 or 
—1 otherwise. It thus follows that this sum has, in the notation of Theorem A.1.1, 
the distribution S; for some r < C'n!—1/* logn. Thus the probability it is at least 
a@ in absolute value can be bounded by 2e-*"/2r_ As shown in the first paragraph 
of the proof of Theorem 14.5.1, the total number of members of F does not exceed 
nee test and thus the probability that there exists a member F’ € F for which 
the sum >”, cr f(x) exceeds C” n}/2-1/2t log n is less than 1 for an appropriately 


chosen constant C’ = C’(c, t). a 
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The range space whose set of points is an arbitrary set of points in the plane, and 
whose ranges are all discs in the plane, has dual shatter function O(g?). The above 
theorem thus shows that it is possible to color any set of n points in the plane red and 
blue, such that the absolute value of the difference between the number of red points 
and the number of blue points inside any disc would not exceed n1/4+°), Similar 
results can be proved for many other geometric range spaces. 


14.6 EXERCISES 


1. Let A be a set of n points in the plane, and let F be the set of all intersections 
of A with an open triangle in the plane. Prove that the discrepancy of F does 
not exceed n}/4+0()), 


2. Prove that n distinct points in the plane determine at most O(n4/?) unit dis- 
tances. 


THE PROBABILISTIC LENS: 
Efficient Packing 


Let C C R” be bounded with Riemann measure pz = (C’) > 0. Let N(C, x) denote 
the maximal number of disjoint translates of C' that may be packed in a cube of side 
zx and define the packing constant 


§(C) = w(C) Jim N(C,2)a—", 


the maximal proportion of space that may be packed by copies of C’. The following 
result improves the one described in Section 3.4. 


Theorem 1 Let C be bounded, convex and centrally symmetric about the origin. 
Then 


Ee) Sa. 


Proof. Fix « > 0. Normalize so pp = u(C) = 2 —«. For any real z let C’, denote the 
“slab” of (z1,...,2n—1) € R”~? such that (21,...,2n—1,z) € C and let p(C,) be 
the usual (n — 1)-dimensional measure of C,. Riemann measurability implies 


— So u(Cmyy = HC). 
meZ 
Let K be an integer sufficiently large so that 
Vie eae <2 
meZ 


and further that all points of C have all coordinates less than K/2. 
252 
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For 1 <i <n-—1 let v; € R” be that vector with all coordinates zero except K 
as the ith coordinate. Let 


v= (Bigs Gi quk es 


where z1,...,2,—1 are chosen uniformly and independently from the real interval 
(0, A). Let A, denote the lattice generated by the v’s; that is, 


Ay = {muy +++) + mp-1Un-1 + mv: mj,,...,Mn-1,m € Z} 
{(mzj +m,K,...,mM2%n-1 + mn_1K,mK-~") 


ll 


M1,+--;Mn—1,M EZ}. 


Let 6(a) denote that unique x’ € (— 5K, 5K] sothatx—mK = 2’ for some m € Z. 
For m € Z let A, be the event that some ™m v1 + °°: + Mn_1Un_-1 + mv € C. 
Since all coordinates of all points of C' are less than K/2, Am occurs if and only if 


(A(mz1), te ,O(mzn—1), MK ~@-))) € C, 


which occurs if and only if (8(mz}),...,9(mzn_1)) © Cy, K-(m-1) The indepen- 
dence and uniformity of the z; over [0, A’) implies the independence and uniformity 
of the 0(z;) over (—4K, 5K] and so 


Pr[Am] = K~@-Y e(Cre-m-0)) « 


Summing over positive m, and employing the central symmetry, 


1 ee 1 
S$) Pr (An) < 5 $0 KO Malan») < 52 =1. 


2 
m>0 mez 


Hence there exists v with all {A,;,}m>o not holding. By the central symmetry A,, 
and A_,, are the same event so no {Am}m<o holds. When m = 0 the points 
MyVy +++ +Mpn-1Un—-1 = K(m,...,Mn—1, 0) all lie outside C except the origin. 
For this v 

AphiC= 40}. 


Consider the set of translates C + 2w,w € A,. Suppose 
z=¢c, + 2w, = co + Que with c,, co € C,wi, we € Ay. 


Then £(cy — C2) = we — w 1. From convexity and central symmetry (C1 —c@)eECc. 
AS We — Ww © Ay, it is zero and hence c; = cg and w, = we. That is, the translates 
form a packing of R". As det(2A,) = 2” det(A,) = 2” this packing has density 
2-" = 2-"(2 ~ €). As € > 0 was arbitrary, 6(C) > 27"), B 
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I5 


Codes, Games 
and Entropy 


Why did you come to Casablanca anyway, Rick? 
I came for the waters. 

Waters, what waters? Casablanca is in the desert. 
I was misinformed. 


— Claude Rains to Humphrey Bogart in Casablanca 


15.1 CODES 


Suppose we want to send a message, here considered a string of bits, across a noisy 
channel. There is a probability p that any bit sent will be received incorrectly. The 
value p is a parameter of the channel and cannot be changed. We assume that p is 
both the probability that a sent zero is received as a one and that a sent one is received 
as a zero. Sent bits are always received, but perhaps incorrectly. We further assume 
that the events that the bits are received incorrectly are mutually independent. The 
case p = 0.1 will provide a typical example. 

How can we improve the reliability of the system? One simple way is to send 
each bit three times. When the three bits are received we use majority rule to 
decode. The probability of incorrect decoding is then 3p? + p? = 0.031 in our 
instance. We have sacrificed speed — the rate of transmission of this method 
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is 3 — and gained accuracy in return. If we send each bit five times and use 
majority rule to decode, the probability of incorrect decoding drops to 0.01051 but 
the rate of transmission also drops to z. Clearly we may make the probability of 
incorrect decoding as low as needed, but seemingly with the trade-off that the rate of 
transmission tends to zero. It is the fundamental theorem of Information Theory — 
due to Claude Shannon — that this trade-off is not necessary: there are codes with 
rate of transmission approaching a positive constant (dependent on p) with probability 
of incorrect transmission approaching zero. 

A coding scheme consists of positive integers m,n, a function f : {0,1}™" > 
{0, 1}” called the encoding function, and a function g : {0,1}”" — {0,1} called the 
decoding function. The notion is that a message (or segment of message) x € {0,1} 
will be encoded and sent as f(a) and a received message y € {0,1}” will be 
decoded as g(y). The rate of transmission of such a scheme is defined as m/n. Let 
E = (€1,...,€n) be arandom string defined by Pr [e; = 1] = p, Pr [e; = 0] = 1—p, 
the values e; mutually independent. We define the probability of correct transmission 
as Pr [g(f(x) + £) = a]. Here x is assumed to be uniformly distributed over {0, 1}” 
and independent of /; ‘+’ here is mod 2 vector addition. 

A crucial role is played by the entropy function 


H(p) = —plog, p — (1 — p) logy(1 — p) 


defined for 0 < p < 1. For any fixed p the entropy function appears in the asymptotic 
formula 


il ae (1+ o(1))” = 2%(F@)+0(2) 
pn) (pn)Pre-Pr((1 — p)n)G-P)ne-C-p)n 


For 0 < p < } we further bound 


y & <(1 +om)(") gn) +00), 
1 pn 


i<pn 


Theorem 15.1.1 [Shannon’s Theorem] Let 0 < p < 4 be fixed. Fore > 0 
arbitrarily small there exists a coding scheme with rate of transmission greater than 
1 — H(p) — € and probability of incorrect transmission less than e. 


Remark. It is not difficult to show that, for every such p, any coding scheme whose 
rate of transmission exceeds 1 — H(p) + € must have a significant error probability. 
Indeed, if f(x), the image of 2, is transmitted, then with high probability, the obtained 
output, y, is of distance (1 + o(1))pn from f(a). Hence, if there are 2 input words, 
the total size of all typical outputs is about 2” - (77) = gmt (ero Ee) Tf this 
quantity is much larger than 2”, then there must be significant overlaps between the 
output sets of different input words, making the decoding likely to err. 


Proof. Let 5 > 0 be such that p+ 5 < $ and H(p +6) < H(p) + €/2. For n large 
setm = n(1— H(p) —), guaranteeing the rate of transmission. Let f : {0,1} — 
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{0,1}” be a random function — each f(x) uniformly and independently chosen. 
Given f define the decoding function g : {0,1}" — {0,1}™ by setting g(y) = 2 if 
x is the unique vector in {0,1} whose image, f(x), is within n(p + 6) of y. We 
measure distance by the Hamming metric p: p(y, y’) is the number of coordinates 
in which y, y’ differ. If there is no such x, or more than one such z, then we shall 
consider decoding to be incorrect. 

There are two ways decoding can be incorrect. Possibly f(x) + E is not within 
n(p +06) of f(z). The distance from f(x) + F to f(x) is simply the number of ones 
in F;, which has a binomial distribution B(n, p) and so this occurs with probability 
o(1) Cin fact, with exponentially small probability.) The only other possibility is that 
there is some x’ 4 x with f(x’) € S, where S is the set of y’ within n(p + 6) of 
f(x)+£#. Conditioning on the values f(x) and EF, f(x’) is still uniformly distributed 
over {0, 1}" and hence this occurs with probability |S|2~” for any particular x’ and 
thus with total probability at most 


gmisia-* < g—n(e/2+o(1)) = o(1) . 


The total probability for incorrect decoding from both sources is thus o(1) and, in 
fact, exponentially small. For n sufficiently large this is less than e. 

The average over all choices of f, x of the probability of incorrect decoding is 
less than «. Thus there exists a specific f (hence a specific coding scheme) with 
probability of incorrect coding less than e. a 


Shannon’s Theorem, dealing with the intensely practical subject of communica- 
tions, puts the shortcomings of the probabilistic approach in sharp contrast. Where 
is the coding scheme? Supposing that a coding scheme may be found, how can 
encoding and decoding be rapidly processed? 

A group code is a coding scheme in which the map f : {0,1}” — {0,1}” is 
linear; that is, f(0) = O and f(a+<2’) = f(x)+ f(a’), all calculations done mod 2. 
Group codes are of particular interest, in part because of the ease of encoding. 


Theorem 15.1.2 Let 0 < p < 5 be fixed. For € > 0 arbitrarily small there exists a 
group code with rate of transmission greater than 1 — H(p) — € and probability of 
incorrect transmission less than €. 


Proof. For 1 <i < m let u; € {0,1} be that vector with a one in position /, 
all other entries zero. Let f(u1),..., f(wm) be chosen randomly and independently 
and then extend f by setting 


fleruy ++ +++ €mUm) = er f(ur) +--+ + emf (um). 


We follow the proof of Shannon’s Theorem until bounding the probability that f(2)+ 
E lies within n(p + 6) of f(a). Set z = 2-2! = eyu, +--+ + €mUm, again all 
mod 2. Asx # x’, z # 0. Reorder for convenience so that €,, = 1. 

By linearity f(z) = f(x) — f(x’) so we bound Pr|[f(z) € S], where S is the 
set of vectors within n(p + 6) of E. Fixing E and the f(u;), i < m, f(z) still has 
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an additive term f(u,,) that is uniform and independent. Hence f(z) is distributed 
uniformly. Thus Pr[f(z) € S] = |S|2~” and the remainder of the proof is as in 
Shannon’s Theorem. a 


15.2 LIAR GAME 


Paul is trying to find a number x € {1,...,n} from a recalcitrant and mendacious 
Carole. He may ask q questions of the form “Is  € S?,” where S can be any subset 
of the possibilities. The questions are asked sequentially and Paul’s choice of his ith 
question can depend on previous responses. Carole is allowed to lie — but she can 
lie at most & times. For which n, q, k can Paul determine the number? 

When & = 0 Paul can win exactly when n < 2%. The values n = 100, g = 10, 
k = 1 make for an amusing parlor game. Carole is hardly a passive observer; she 
may play an adversary strategy. By that we mean that she does not select an x in 
advance but answers consistently with at least one x. At the end of the game if her 
answers were consistent with more than one zx then she has won. The game, called 
the (n, q, &)-Liar Game, is now a perfect information game with no hidden move and 
no draws. Hence either Paul or Carole has a perfect winning strategy. But who? 

We describe an equivalent game, the Chip-Liar Game. There is a board with 
positions 0,1,...,&. There are n chips labeled 1,..., 2 which are initially at position 
k, There are g rounds. On each round Paul selects a set S of the chips. Carole can 
either move every chip not in S one position to the left or move every chip in S 
one position to the left. (Here position i — 1 is one position to the left of position 7. 
Chips moved one position to the left from position 0 are removed from the board.) At 
the end of the g rounds Carole wins if there is more than one chip remaining on the 
board and Paul wins if there is one or zero chip remaining on the board. Basically, 
chip 7 at position 7 represents that the answer x = i has already received k — j 
lies; Paul selecting S represents his asking if 7 € S; Carole moving the chips not in 
S represents a Yes answer, moving the chips in S represents a No answer. (In the 
Chip-Liar Game Carole can remove all chips from the board while in the Liar Game 
Carole must play consistently with at least one xz. But when Carole removes all chips 
from the board she automatically has lost and hence this difference does not affect 
the determination of the winner.) 

In the Chip-Liar Game there is no reason to place all chips at position k at the start. 
More generally, for 79,...,24 > 0, we define the (xo,..., 2%), qg-Chip-Liar Game 
to be the above g round game with initial position consisting of x; chips at position 
i. This, in turn, corresponds to a Liar Game in which there are x; possibilities for 
which Carole is constrained to lie at most 7 times. 

Let us define B(q,7) as the probability that in q flips of a fair coin there are at 
most j heads. Of course, we have the exact formula 


Ba.n=r4d(‘), 


ix 
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Theorem 15.2.1 Jf 


k 
S > 2;:B(q,i) >1 
i=0 


then Carole wins the (ao, ..-, 2%), q-Chip-Liar Game. 
Corollary 15.2.2 If 
99 
i> 
Dizo (7) 


then Carole wins the (n, q, k)-Liar Game. 


Proof [Theorem 15.2.1]. Fix a strategy for Paul. Now Carole plays randomly! That 
is, at each round, after Paul has selected a set S of chips Carole flips a coin — if 
it comes up heads she moves every chip not in S one position to the left and if it 
comes up tails she moves every chip in S one position to the left. For each chip c 
let J, be the indicator random variable for c remaining on the board at the end of 
the game. Set X = 5° I,, the number of chips remaining on the board at the end 
of the game. Consider a single chip c. Each round Paul may have chosen c € S or 
c ¢ S but in either case c is moved to the left with probability 5: Suppose c starts at 
position 7. It remains on the board at the end of the game if and only if in the g rounds 
it has been moved to the left at most i times. Then E[/,], the probability of this 
occurring, is precisely B(q, 7). By linearity of expectation E [|X] = en x, B(q,i). 
The assumption of the theorem gives E [X] > 1. But then X > 1 must occur with 
positive probability. That is, Carole must win with positive probability. 

No strategy of Paul allows him to always win. But this is a perfect information 
game with no draws so someone has a perfect strategy that always wins. That 
someone isn’t Paul, so it must be Carole. | 


The above proof certainly illustrated the magical element of the probabilistic 
method. Carole has a winning strategy but what is it? The general notion of 
moving from a probabilistic existence proof to an explicit construction is called 
derandomization and will be dealt with in detail in the next chapter. Here we can give 
an explicit strategy. With / moves remaining in the game and y; chips on position 7 
define the weight of the position as 57; y;B(1, 7) — note this is E [Y], where Y is the 
number of chips that would remain on the board should Carole play the rest of the 
game randomly. Carole’s explicit strategy is to always move so as to maximize the 
weight. 

Consider any position with weight W and any move S by Paul. Let W¥, W” be 
the new weights should Carole move all chips not in S' or all chips in S, respectively. 
We claim W = 5 (WY + W™”). One argument is that by linearity this identity 
reduces to the case of one chip and it then follows from the identity B(I,7) = 
$(B(l — 1,7) + B(L-1,j — 1)). But we needn’t actually do any calculation. 
Carole’s playing randomly can be thought of as first flipping a coin to decide on 
her first move and then playing randomly so that E[Y] is the average of the two 
conditional expectations. 
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At the start of the game, by assumption, the weight is larger than one. Carole’s 
explicit strategy assures that the weight does not decrease so at the end of the game 
the weight is larger than one. But at the end of the game the weight is the number of 
chips remaining. Being larger than one, Carole has won the game. 

The converse of the theorem, and even the corollary, is false. Consider the Liar 
Game with n = 5, g = 5 questions and k = 1 possible lie. In the Chip-Liar version 
this is the (0,5), 5-Chip-Liar Game. Here B(5,1) = & and5- $ <1. Still, Carole 
wins with perfect play. The problem is that Paul has no good first move. Suppose 
he selects two chips as S (asks “Is x < 2?” in the Liar Game). Then Carole moves 
the two chips one to the left (responds Yes) leaving the position (2,3) with four 
questions remaining. As 2B(4, 0) + 3B(4,1) = 42 > 1, Carole will now win. It is 
easy to check that all other moves of Paul fail. The difficulty here is that Paul was in 
a position with weight W < 1 but was unable to find a move such that W¥ < 1 and 
W" <1. 


15.3 TENURE GAME 


Paul, Chair of Department, is trying to promote one of his faculty to tenure but 
standing in his way is a recalcitrant and meanspirited Carole, the Provost. There 
are k pretenure levels, labeled 1,...,&, level 1 the highest, and a level 0, represent- 
ing tenure. For our purposes each faculty member is represented by a chip. The 
(x1,...,@%)-Tenure Game begins with 2; chips at level i for 1 < i < k and no chips 
on level zero. Each year Paul presents a set S of chips to Carole. Carole may either: 


e Promote all chips in S and fire the others or 
e Promote all chips not in S and fire those in S. 


Promote, as used above, means to move from level 7 to level 1 — 1. Fired means 
just that: removing the chip from the game. If a chip reaches level 0 then Paul is the 
winner. The draconian promotion or perish provision ensures that the game will end 
within k years with either Paul winning or Carole having successfully eliminated all 
chips. 


Theorem 15.3.1 If 5), 2;2~' < 1 then Carole wins the (x1,...,£,)-Tenure Game. 


Proof. Fix a strategy for Paul. Now Carole plays randomly! That is, at each round, 
after Paul has selected a set S of chips Carole flips a coin — if it comes up heads she 
moves every chip not in S one position to the left and if it comes up tails she moves 
every chip in S one position to the left. For each chip c let J, be the indicator random 
variable for c reaching level 0. Set X = 5° I, the number of chips reaching level 0 
at the end of the game. Consider a single chip c. Each round Paul may have chosen 
c€ Sorc ¢ S but in either case c is moved to the left with probability 3. Suppose 
c Starts at position 7. It remains on the board at the end of the game if and only if 
the first ¢ coin flips of Carole led to promotions for c. Then E [J,], the probability 
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of this occurring, is precisely 2~*. By linearity of expectation E |X] = sre ne mae 


The assumption of the theorem gives E [X] < 1. But then X < 1 must occur with 
positive probability. That is, Carole must win with positive probability. 

No strategy of Paul allows him to always win. But this is a perfect information 
game with no draws so someone has a perfect strategy that always wins. That 
someone isn’t Paul, so it must be Carole. |_| 


As with the Liar Game we may derandomize the above argument to give an explicit 
strategy for Carole. With y; chips on position 7 define the weight of the position as 
>>; yi2~* — note this is E [Y], where Y is the number of chips that would reach level 
0 should Carole play the rest of the game randomly. Carole’s explicit strategy is to 
always move so as to minimize the weight. Consider any position with weight W and 
any move S by Paul. Let WY, W” be the new weights should Carole move all chips 
not in S or all chips in S, respectively. As in the Liar Game W = 3(W¥ + W”). 
At the start of the game, by assumption, the weight is less than one. Carole’s explicit 
strategy assures that the weight does not increase so at all times the weight is smaller 
than one. A chip at level 0 would add one to the weight by itself so that this never 
occurs and hence Carole wins. 

In the Liar Game the sufficient condition for Carole to win was not necessary 
because Paul did not always have an appropriately splitting move. Here, however, 
we have an amusing lemma. 


Lemma 15.3.2 Ifa set of chips has weight at least one it may be split into two parts, 
each of weight at least 5 


Proof. There must be two chips at some position 7, otherwise the weight is less than 
one. If there are two chips at position 1 simply split them. If there are two chips at 
position 7 > 1 glue them together, and consider them as one superchip at position 
i — 1. Then the proof follows by induction on the number of chips. | 


Theorem 15.3.3 If S~ 2;2~* > 1 then Paul wins the (21,...,2%)-Tenure Game. 


Proof. The initial weight is at least one. Applying the lemma Paul splits the chips 
into two parts, each of weight at least i, and sets S' equal to one of the parts. Carole 
moves all chips in one part one position to the left, doubling their weight, leaving a 
new position of weight at least one. Thus the weight never goes below one. Therefore 
the game cannot end with all chips having been removed (which would have weight 
zero) and so it must end with a win for Paul. | 


15.4 BALANCING VECTOR GAME 


The balancing vector game is a perfect information game with two players, Pusher and 
Chooser. There is a parameter n > 1, and we shall be concerned with asymptotics in 
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n. There are n rounds, each involving vectors in Z”. There is a position vector P € 
Z”, initially set at 0. Each round has two parts. First Pusher picks v € {—1,+1}”. 
Then Chooser either resets P to P + v or to P — v. At the end of the nth round the 
payoff to Pusher is |P|,., the maximal absolute value of the coordinates of P. Let 
VAL(n) denote the value of this game to Pusher, that is, the maximum payoff Pusher 
can ensure when both players play optimally. Let S,, denote, as usual, the sum of n 
independent uniform £1 random variables. 


Theorem 15.4.1 /f Pr [|S;,| > a] <7! then VAL(n) < a. 


Proof. Consider the game a win for Pusher if the final |P|,, > a. Suppose Chooser 
announces that she will flip a fair coin each round to determine whether to reset P as 
P+vorP-—v. Let x; be the ith coordinate for the final value of the position vector 
P. Let W; be the event |z;| > a and W = Ves W; so that W is the event of Pusher 
winning. Regardless of Pusher’s strategy x; has distribution S,, so that 


Pr |W] < SOP r(lSal |e A 


w=1 


Pusher cannot always win so Chooser always wins. a 


Corollary 15.4.2 VAL(n) = O(VnInn). 


To give a lower bound on VAL(n) one wants to find a strategy for Pusher that 
wins against any Chooser. It is not sufficient to find a strategy that does well against 
a randomly playing Chooser — the Chooser is an adversary. Still, the notion of a 
randomly playing Chooser motivates the following result. 


Theorem 15.4.3 /f Pr [|S,| > a] > cn~'/?, where c is an absolute constant, then 
VAL(n) > a. 


Corollary 15.4.4 VAL(n) = Q(VnInn) and hence VAL(n) = O(Vn Inn). 


Proof [Theorem 15.4.3]. Define, for zr € Z,0 <i<n, 
w(x) = Pr [lz + Sp_-;| > a] . 


For- P= Gris, 1,04) SUP) = paar w;(x;). When P is the position vector 
at the end of the ith round, w;{P) may be interpreted as the expected number of 
coordinates with absolute value greater than a at the end of the game, assuming 
random play by Chooser. At the beginning of the game wo(P) = wo(0) > c/n 
by assumption. Given position P at the end of round 7, Pusher’s strategy will be to 
select v € {—1,+1}” so that w,_1(P — v) and wj41(P + v) are close together. 

The distribution 7+ S,,_; splits intoz+1+S$,_;_; andx—1+5,_;_; depending 
on the first coin flip so that for any 2, x, 


w(x) = S(witi(e + 1) + wii (ze - 1). 


1 
2 
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Set.P = (21550359 )st = Wisse.) FOF LS 7S set 
A; = Wisi (2; + 1) = Wi41 (Ly = 1) 


so that Es 
wi41(P + v) = wi41(P = v) = be 


and, fore = +1, 
wi4i(P + ev) = wi(P) + sé) vjAy ‘ 


Now we bound |A,|. Observe that 
A; = Pr [Sweist = y| —Pr Sati = z] Fi 


where y is the unique integer of the same parity as n — 7 — 1 in the interval (a — (aj + 
1),a — (a; ~ 1)|] and z is the same in [—a — (a; + 1),~—a — (x; — 1)). Let us set 


m a 2 
g(m) = max Pr (oe S38) = Gary ae aes 
so that |A;| < g(n —i— 1) for all j. 
A simple strategy for Pusher is then to reorder the coordinates so that |A,| > 
- > |A,,| and then select v1,...,0n € {—1,+1} sequentially, giving vj; A; the 
opposite sign of vy; A; +--+ + u;-1A;_1. (When i = 1 or the sum is zero, choose v; 
arbitrarily.) This assures 


Let P* denote the position vector at the end of the ith round and v Pusher’s choice 
for the (7 + 1)st round. Then regardless of Chooser’s choice of « = £1, 


1S ae ; 
wi41(P*t") = wi41(P*+ev) 2 wi(P')—5 >So uj Ay 2 wi(P")—59(n—i-1). 
j=l 


Thus 


Simple asymptotics give that the above sum is asymptotic to (8n/7)!/2. Choosing 
c > (2/7)'/?, wp(P") > 0. But wp(P”) is simply the number of coordinates with 
absolute value greater than a in the final P = P”. This Pusher strategy assures there 
is more than zero, hence at least one such coordinate and therefore Pusher wins. 
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15.56 NONADAPTIVE ALGORITHMS 


Let us modify the balancing game of Section 15.4 by requiring the vectors selected by 
Pusher to have coordinates zero and one rather than plus and minus one. Let VAL*(n) 
denote the value of the modified game. One can use the bounds on VAL(n) to show 
VAL*(n) = O(vVninn). 

In Chapter 13 we showed that any family of n sets 5),...,S, on points 1,...,n 
has discrepancy O(,/n); that is, there is a coloring y : {1,...,n} — {—1,+1} so 
that all | (S;)| < c\/n. The proof of this result does not yield an effective algorithm 
for finding such a coloring and indeed it is not known if there is a polynomial time 
algorithm to do so. Suppose one asks for a nonadaptive or on-line algorithm in the 
following sense. Instead of being presented the entire data of S),...,5, at once, 
one is presented with the points sequentially. At the 7th “round” the algorithm looks 
at point 7 — more specifically, at which sets 5; contain 7 or, equivalently, at the 
jth column of the incidence matrix. At that stage the algorithm must decide how to 
color 7 and, once colored, the coloring cannot be changed. How small can we assure 
max |x(.S;)| with such an algorithm? We may think of the points as being presented 
by an adversary. Thinking of the points as their associated column vectors, Pusher 
as the worst case adversary and Chooser as the algorithm, the best such an algorithm 
can do is precisely VAL*(n). 

The requirement that an algorithm be nonadaptive is both stronger and weaker 
than the requirement that an algorithm take polynomial time. Still, this lends support 
to the conjecture that there is no polynomial time algorithm for finding a coloring 
with all |y(S;)| < en. 


15.6 HALF LIAR GAME 


We modify the Liar Game of Section 15.2 by limiting Carole’s mendacity. If the 
correct answer is Yes then Carole is now required to answer Yes. She may answer 
Yes when the correct answer is No, and that would count as one of her & lies. Let 
Ax(q) denote the maximal n for which Paul wins the Half Liar Game with n values, 
q queries, and a maximum of k lies. 


Theorem 15.6.1 [Dumitriu and Spencer (2004)] For each fixed k > 1, 


Ax(q) ~ ann) 


While the methods below extend to arbitrary k, we give the proof only for the case 
k = 1. This case was first given by Cicalese and Mundici (2000). 


Proof. Let us fix a winning strategy for Paul with n = Aj(q). This may be 
described by a binary decision tree of depth g. For each value i, 1 <i < n, let 
Oi = (@i1,.-., Lig) € {Y, N}2 be the string of truthful responses to Paul’s queries 
with that value. Let T; be the set of possible response strings given by Carole with that 
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value. For each z;; = N Carole may lie on the jth question, otherwise responding 
truthfully. Thus |7;| = W(o;), where we define the weight W(c) to be one plus the 
number of N’s in the string 7. We cannot have any common o € 7; T; as then 
Carole could respond with o and Paul would not be able to distinguish 7, 7. Thus 


S> Wo) < 2%. (15.1) 
i=l 


For a given u, call o Carole friendly if W(o) < 1+ $(q — u), otherwise Paul 
friendly. There are at most 27Pr |S, < —u] Carole friendly o’s, where Sq is the sum 
of q independent +1 random variables. By (15.1) there are at most 27*!/(q — u) 
Paul friendly o’s. Thus 


gqt1 


nm < 29Pr [Sg < —u] + 
q-u 
The optimization of u is left as an exercise but even taking a suboptimal u = | q?/ 3] 


gives 
9q+1 
Ai(q) =n < (1+ 0(1)) 


For larger n Paul cannot have a winning strategy and thus Carole must have a winning 
adversary strategy. Intriguingly, this argument does not yield an explicit strategy for 
Carole. 

In the other direction let « > 0 be fixed and small and set n = |(1 — €)29*1/q]. 
We will give a strategy for Paul. For r > 1 let M, denote those o € {Y, N}" with 
at least 4(r — u) N’s and let f(r) = |M,|. For definiteness, take u = |r?/9]. 
Then f(r) ~ 2". We first massage n. Pick r with, say, 10/e < n/f(r) < 21/e, 
set A = [n/f(r)], and boost n ton = Af(r). As the boost (which makes things 
only harder for Paul) was by a factor less than 1 + (€/10) the new n still has 
n < (1 —€/2)29*1/q, 

Paul associates the n = f(r)A values with pairs (o,7),0 € M,,1 <j < A. For 
his first r queries he asks for the coordinates of 0. Carole responds 7, which can 
differ from the truthful o in at most one coordinate. Thus 7 has at most 3(r +u)41 
Y’s. (Basically, these r queries are nearly even splits and force Carole to answer No 
nearly half the time.) What does Paul know at this moment? If Carole has not lied 
the answer must be one of the A values (7,7). If Carole has lied the answer must 
be one of the at most 5 A(r + u + 2) values (r+, 7), where 7* is derived from 7 by 
shifting a Y toa N. 

Set s = q — 1, the number of remaining queries. As A is bounded and 2"A ~ 
f(r)A =n = 0(29/q), we have r = q — logy g — O(1). In particular, r ~ q, the 
first r queries were the preponderance of the queries. Then 


A< (1 + o(1))n2-” < ( = . a o(1)) gq+i—r, 


aut t+u+2 

r Uu r € 

—_—_———_ ~ ae (oe ) 8 
5 AD <( 5 t+ (1) 2 


A 
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Paul may now give further ground and allow Carole to lie in either direction for the 
remaining s questions. This is the (29,271), s-Chip-Liar Game with zo = A and 
x21 < (1—€/2+0(1))2°. The endgame strategy required at this point is given in the 
exercises. a 


15.7 ENTROPY 


Let X be a random variable taking values in some range S, and let P(X = x) denote 
the probability that the value of X is x. The binary entropy of X, denoted by H(X), 
is defined by 
1 
= P(X = 2) log, —————. 
res 

If Y is another random variable, taking values in T, and (X,Y) is the random 
variable taking values in S x J’ according to the joint distribution of X and Y, then 
the conditional entropy of X given Y is 


H(X|Y) = H(X,Y)-H(Y). 


In this section we prove some simple properties of the entropy function and describe 
several surprising combinatorial and geometric applications. Intuitively, the entropy 
of a random variable measures the amount of information it encodes. This provides 
an intuitive explanation to the four parts of the next simple lemma. The formal proof, 
given below, uses the properties of the functions log z and z log z, where here, and 
in the rest of this section, all logarithms are in base 2. 


Lemma 15.7.1 Let X,Y and Z be three random variables taking values in S,T and 
U, respectively. Then 


(i) H(X) < log|S|. 

(X,Y) 2 H(X). 

(iii) H(X,Y) < H(X) + H(Y). 
(iv) H(X|Y,Z) < H(X\Y). 


(ii) H 


Proof. 


(i) Since the function log z is concave it follows, by Jensen’s Inequality, that 


1 
SPX =e BEC aay 


1E3 


H(X) 


IA 


log SP’ =D ERE = log |S]. 
ieS 


ENTROPY 


(ii) By the monotonicity of log z for all z > 0, 


H(X,Y) 


Gii) By definition 


P(X =i,Y =j 
= S- So P(X =, ¥ = j) log Wea zu] 


1 
ase S> So P(X =i, Y = j)log — - 
i€S jeT PX = Y= 4) 
1 
> =7j =%79 a 
> Ds i,Y D8 BIH 
i€S jET 
1 
_ P(X = 1) log ———~ = HX). 
dP P(X =i) ‘~) 


H(X) + H(Y) - H(X,Y) 


t€S GET 


=) Sy PA SOPY Spi), 


i€S jET 


where f(z) = zlogz and a4; = P(X =1,Y = j)/[P(X =iP(Y 
Since f(z) is convex it follows, by Jensen’s Inequality, that the last quantity is 


f (S55 P= 0007 = = filo; 


at least 


(iv) Note that 


i€S GET 


H(X|Y) = H(X,Y)-H(Y) 


PUY =j 
YU MK=4Y = Doe gy. 
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= j)). 


iES GET 
Similarly 
H(X|Y,Z) 
D0 P(X =4,Y =5,2 =k) log a 
i€S JET kCU P(X =1,Y = 3,2 =k) 
Therefore 


H(X|Y) — H(XIY, Z) 


SS Peat Y S52 Sh) 


i€S JET kEU 

P(Y =)P(X =i,Y =5,Z=h) 

ee (Y =j,Z =k) 
X=i,Y =7)P(Y =j3,Z=k 

yyy By Hes). 


t€S jET kEU 


-log 
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where f(z) = z log z and 


PY SPX Sty = 42 Sk) 


“ik P(X =1,Y =j)P(Y =j,Z =k) 


By the convexity of f(z) and since 


P(X =1,Y =7)P(Y =j,Z=k 
Dy ee oh 


tES JET kKEU 


it follows that the above quantity is at least 


op | = 0) = 0 


i€S JET kEU 


The following simple but useful fact that the entropy is subadditive has already 
been applied in Section 13.2. 


Proposition 15.7.2 Let X = (X1,...,Xn) be a random variable taking values in 
the set S = S; x Sy X +++ X Sp, where each of the coordinates X; of X is a random 
variable taking values in S;. Then 


< So A(X) 


Proof. This follows by induction from Lemma 15.7.1, part (iii). Ei 


The above proposition is used in Kleitman, Shearer and Sturtevant (1981) to 
derive several interesting applications in Extremal Finite Set Theory, including an 
upper estimate for the maximum possible cardinality of a family of k-sets in which the 
intersection of no two is contained in a third. The basic idea in Kleitman et al. (1981) 
can be illustrated by the following very simple corollary of the last proposition. 


Corollary 15.7.3 Let F be a family of subsets of {1,2,...,n} and let p; denote the 
fraction of sets in F that contain i. Then 


|F | < Qbte1 Alpi) 


where H(p) = —plog2 p — (1 — p) log2(1 — p). 


Proof. Associate each set F € F with its characteristic vector u(F’), which is a 
binary vector of length n. Let X = (Xj,...,X,,) be the random variable taking 
values in {0,1}", where P(X = v(F’)) = 1/|F| for all F € F. Clearly H(X) = 
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|F| - (1/|F |) log |F| = log |F| and since here H(X;) = H(p;) for all 1 <i<n, 
the result follows from Proposition 15.7.2. a 


The following interesting extension of Proposition 15.7.2 has been proved by 
Shearer; see Chung, Frankl, Graham and Shearer (1986). As in that proposition, let 
X = (X),...,Xn) be a random variable taking values in the set S = S; x S2 x 

- x S,,, where each X; is a random variable taking values in S;. For a subset J of 
{1,2,...,n}, let X (J) denote the random variable (X;)je,- 


Proposition 15.7.4 Let X = (Xq,...,Xn) and S be as above. If G is a family of 
subsets of {1,...,n} and eachi © {1,...,n} belongs to at least k members of G 
then 
kH(X) < $> H(X(G)). 
GEG 


Proof. We apply induction on k. For k = 1, replace each set G € G by a subset of it to 
obtain a family G’ whose members form a partition of {1,...,}. By Lemma 15.7.1, 
part (ii), So geg H(X(G)) = Mereg: H(X(G")) and by Lemma 15.7.1, part (iii), 
Voreg H(X(G")) > A(X), supplying the desired result for k = 1. 

Assuming the result holds for k — 1, we prove it for k (> 2). If there isa G € G 
with G = {1,...,n}, the result follows from the induction hypothesis. Otherwise, 
let G, G’ be two members of G. By applying Lemma 15.7.1, part (iv), we conclude 
that 


H (x(G\e 


X(GNG), X(G’ \ G)) <H (x(G \G’) | X(GN c’)) 
implying that 
H(X(GUG")) — H(X(G’)) < H(X(G)) — H(X(GNG@’)). 


Therefore H((X(GUG’)) + H(X(GNG’)) < H(X(G)) + H(X(G’)). It follows 
that if we modify G by replacing G and G’ by their union and intersection, then the 
sum )\c¢eg H(X(G)) can only decrease. After a finite number of such modifications 
we can reach the case in which one of the sets in G is {1,...,}, and as this case has 
already been proved, this completes the proof. | 


Corollary 15.7.5 Let F be a family of vectors in S, x Sy x-+. x Sy. LetG = 
{G\, Go,...,Gm} be a collection of subsets of N = {1,2,...,n}, and suppose that 
each element i © N belongs to at least k members of G. For each 1 <i < mlet Ff; 
be the set of all projections of the members of F on G;. Then 


IF < []IFil. 
i=1 
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Proof. Let X = (X1,...,Xn) be the random variable taking values in , where 
P(X = F) =1/|F| for all F € F. By Proposition 15.7.4, 


bH(X) < ST H(X(G))). 


But H(X) = log, |F|, whereas by Lemma 15.7.1, part (i), H(X(G;)) < logs |Fil, 
implying the desired result. | 


Since the volume of every d-dimensional measurable set in R” can be approxi- 
mated by the volume of an appropriate approximation of it by standard aligned boxes 
in a fine enough grid, the last result has the following geometric application, proved 
in Loomis and Whitney (1949) in a different manner. 


Corollary 15.7.6 Let B be a measurable body in the n-dimensional Euclidean space, 
let Vol(B) denote its (n-dimensional) volume, and let Vol(B;) denote the (n — 
1)-dimensional volume of the projection of B on the hyperplane spanned by all 
coordinates besides the ith one. Then 


(Vol(B))"~? < |] Vol(B,). 
i=1 
If S; = {0,1} for all ¢ in Corollary 15.7.5, we get the following statement about 
set systems. 


Corollary 15.7.7 [Chung et al. (1986)] Let N be a finite set, and let F be a family 
of subsets of N. LetG = {G,,...,Gm} be a collection of subsets of N, and suppose 
that each element of S belongs to at least k members of G. For each1 <i<m 
define F; = {F 1G; : F € F}. Then 


mm 
IF < [[Ail- 
a=1 
We close the section with the following application of the last result, given in 
Chung et al. (1986). 


Corollary 15.7.8 Let ¥ be a family of graphs on the labeled set of vertices {1,...,t}, 
and suppose that for any two members of F there is a triangle contained in both of 
them. Then 


FI < 520). 


Proof. Let N be the set of all (5) unordered pairs of vertices in T = {1,2...,t}, 
and consider F as a family of subsets of N. Let G be the family of all subsets of 
N consisting of the edge sets of unions of two vertex disjoint nearly equal complete 


graphs in T. Let 
(PC 
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denote the number of edges of such a union, and let m denote the total number of 
members in G. By symmetry, each edge in N lies in precisely k = sm/ @) members 
of G. The crucial point is that every two graphs in ¥ must have at least one common 
edge in each G € G, since their intersection contains a triangle (and there are no 
triangles in the complement of G.) Therefore, in the notation of Corollary 15.7.7, 
the cardinality of each F; is at most 2°—!. We thus conclude that 


|F|e/(2) z (ge=t\r 


implying that 
IF| < 2()-G)/s , 


and the desired result follows, as s < $(5). a 


Simonovits and Sés conjectured that if F satisfies the assumptions of the last 
corollary, then, in fact, 


1st 
<— (3) 


which, if true, is tight. This remains open. It seems plausible to conjecture that 
there is some absolute constant « > 0, such that for any fixed graph H that is not a 
star-forest (i.e., a forest each connected component of which is a star), the following 
holds. Let F be a family of graphs on the labeled set of vertices {1,2,...,¢}, and 
suppose that for any two members of F there is a copy of H contained in both of 


them. Then 
IF < € = 3) a(2), 


This is also open, though it is not difficult to show that it is true for every H of 
chromatic number at least 3, and that the conclusion fails for every star-forest H. 


15.8 EXERCISES 


1. Suppose that in the (7,,...,2,,)-Tenure Game of Section 15.3 the object of 
Paul is to maximize the number of faculty receiving tenure while the object 
of Carole is to minimize that number. Let v be that number with perfect play. 


Prove v = Ds Pa 


2. Let Ai,...,A4n © {1,...,m} with 77, 2-\4il < 1, Paul and Carole 
alternately select distinct vertices from {1,..., 7m}, Paul having the first move, 
until all vertices have been selected. Carole wins if she has selected all the 
vertices of some A;. Paul wins if Carole does not win. Give a winning strategy 
for Paul. 


3. Let F be a family of graphs on the labeled set of vertices {1,2,...,2t}, and 
suppose that for any two members of F there is a perfect matching of ¢ edges 
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contained in both of them. Prove that 


\F| <2), 


. (Han’s Inequality.) Let X = (X1,...,Xm) be a random variable and let 


H(X) denote its entropy. For a subset J of {1,2,...,m}, let X (I) denote the 
random variable (X;);e7. For 1 < q < m, define 


HX) = ay ~~ W(X). 
q—-1/ Qc{i,...,m},|Ql=¢q 


Prove that 


. Let X; = 1,1 <i <n, be uniform and independent and let S,, = Ree Xj. 


Let 0 < p< }. Prove 
Pr [Sp > (1 — 2p)n] < 247 )ng-™ 
by computing precisely the Chernoff bound miny>o E [erSn] een). 


(The case p = 0 will require a slight adjustment in the method though the 
end result is the same.) 


. (Parameter optimization in the Half Liar Game.) Find, asymptotically, the 


u = u(q) that minimizes 27Pr[S, < —u] + 29*!/(q — u) and express the 
minimal value in the form 27*1/g + (1 + o(1))g(q) for some function g. 


. Show that for A fixed and r sufficiently large Paul wins the (2”—(r+1) A, A), r- 


Chip Liar Game. 


THE PROBABILISTIC LENS: 
An Extremal Graph 


Let T (top) and B (bottom) be disjoint sets of size m and let G be a bipartite graph, 
all edges between T and B. Suppose G contains no 4-cycle. How many edges can 
G have? This is a question from Extremal Graph Theory. Surprisingly, for some m, 
we may give the precise answer. 

Suppose m = n? + n+ 1 and that a projective plane P of order n (and hence 
containing m points) exists. Identify 7’ with the points of P and B with the lines of 
P and define G = Gp by letting t € T be adjacent to b € B if and only if point 
t is on line b in P. As two points cannot lie on two lines, Gp contains no 4-cycle. 
We claim that such a Gp has the largest number of edges of any G containing no 
4-cycle and further that any G containing no 4-cycle and having that many edges can 
be written in the form G = Gp. 

Suppose G contains no 4-cycle. Let b1,b2 € B be a uniformly selected pair 
of distinct elements. For t € T let D(t) be the set of 6 € B adjacent to t and 
d(t) = |D(t)|, the degree of t. Let J; be the indicator random variable for t being 
adjacent to b;, bo. Then 


E [i] = Pr (bi, b2 € D(t)] = Ge (7) ; 


Now set 


KS hs 


teT 


the number of t € T adjacent to 1, b2. Then X < 1; that is, all 6), b2 have at most 
one common neighbor. (X < 1 is actually equivalent to G containing no 4-cycle.) 
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Linearity of expectation gives 


E[X]=) “EJ = >> GD ne 


teT teT 


Let d = m7! der a(t) be the average degree. Convexity of the function (4) gives 


eG) eG) 


with equality if and only if all ¢ € T’ have the same degree. Now 


1 > max X > E[X]> m(3)/("2). 


When G = Gp all d(x) = d (every line has n + 1 points) and_X is identically 1 (two 
points determine precisely one line) so that the above inequalities are all equalities 


” 1=m(2)/(2) 


Any graph with more edges would have a strictly larger d so that 1 > m(4) / (3) 
would fail and the graph would contain a 4-cycle. 

Suppose further G has the same number of edges as G p and contains no 4-cycle. 
The inequalities then must be equalities and so X = 1 always. Define a geometry 
with points J’ and lines given by the neighbor sets of b € B. As X = 1 any two 
points determine a unique line. Reversing the roles of T, B one also has that any two 
lines must determine a unique point. Thus G' is generated from a projective plane. 
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Derandomization 


Math is natural. Nobody could have invented the mathematical universe. It was 
there, waiting to be discovered, and it’s crazy; it’s bizarre. 
— John Conway 


As mentioned in Chapter 1, the probabilistic method supplies, in many cases, effective 
randomized algorithms for various algorithmic problems. In some cases, these 
algorithms can be derandomized and converted into deterministic ones. In this 
chapter we discuss some examples. 


16.1 THE METHOD OF CONDITIONAL PROBABILITIES 


An easy application of the basic probabilistic method implies the following statement, 
which is a special case of Theorem 2.3.1. 


Proposition 16.1.1 For every integer n there exists a coloring of the edges of the 
complete graph K,, by two colors so that the total number of monochromatic copies 
of K4 is at most Ye) -9- 


Indeed, (i) - 2—° is the expected number of monochromatic copies of Ky in a 
random 2-edge-coloring of K,,, and hence a coloring as above exists. 


The Probabilistic Method, Third Edition By Noga Alon and Joel Spencer 
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Can we actually find deterministically such a coloring in time which is polynomial 
inn? Let us describe a procedure that does, which is a special case of a general 
technique called the method of conditional probabilities. 

We first need to define a weight function for any partially colored K,. Given a 
coloring of some of the edges of K,, by red and blue, we define, for each copy K of 
K4 in Ky, a weight w(K) as follows. If at least one edge of K is colored red and 
at least one edge is colored blue then w(K’) = 0. If no edge of K is colored, then 
w(K) = 2-°, and if r > 1 edges of K are colored, all with the same color, then 
w(K) = 27. Also define the total weight W of the partially colored K;, as the sum 
>> w(K), as K ranges over all copies of K4 in K,,. Observe that the weight of each 
copy i of K4 is precisely the probability that it will be monochromatic, if all the 
presently uncolored edges of K,, will be assigned randomly and independently one 
of the two colors red and blue. Hence, by linearity of expectation, the total weight 
W is simply the expected number of monochromatic copies of ‘4 in such a random 
extension of the partial coloring of K,, to a full coloring. 

We can now describe the procedure for finding a coloring as in Proposition 16.1.1. 
Order the (5) edges of K,, arbitrarily, and construct the desired two-coloring by 


coloring each edge either red or blue in its turn. Suppose e€1,..., e;-1 have already 
been colored, and we now have to color e;. Let W be the weight of K,,, as defined 
above, with respect to the given partial coloring c of e1,...,e;-1. Similarly, let 


Wrea be the weight of K,, with respect to the partial coloring obtained from c by 
coloring e; red, and let Wpiue be the weight of ’,, with respect to the partial coloring 
obtained from c by coloring e; blue. By the definition of W (and as follows from its 
interpretation as an expected value), 


Wrea ae Wolue 
== - oe : 


The color of e; is now chosen so as to minimize the resulting weight; that is, if Wrea < 
Whine then we color e; red, otherwise, we color it blue. By the above inequality, the 
weight function never increases during the algorithm. Since at the beginning its value 
is exactly (4))2~°, its value at the end is at most this quantity. However, at the end all 
edges are colored, and the weight is precisely the number of monochromatic copies 
of K4. Thus the procedure above produces, deterministically and in polynomial time, 
a 2-edge-coloring of K,, satisfying the conclusion of Proposition 16.1.1. 

Let us describe, now, the method of conditional probabilities in a more general 
setting. An instance of this method is due, implicitly, to Erd6és and Selfridge (1973) 
and more explicit examples appear in Spencer (1987) and in Raghavan (1988). 
Suppose we have a probability space, and assume, for simplicity, that it is symmetric 


w= 


and contains 2! points, denoted by the binary vectors of length |. Let A;,..., As be 
a collection of events and suppose that 5*7_, Pr[A;] = k. Thus k is the expected 
value of the number of events A, that hold, and hence there is a point (€],...,€) 


in the space in which at most k events hold. Our objective is to find such a point 
deterministically. 
For each choice of (€1,... , €;~1) and for each event A;, the conditional probability 


Pr [Ai | €qgs fej | 
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of the event A; given the values of €1,...,€;—1 is clearly the average of the two 
conditional probabilities corresponding to the two possible choices for €;. That is, 


Pr [Ai | €1,.--,€j-1,0] + Pr [Ai | Ciysveiejnay | 


Pr [Aj | e1,-+-,€j-1] = 5) 


Consequently, 


s 
ye [Ai | ee 
i=1 
le l< 
= 2 Pt [Ai | €1,...,€j-1,0] aoe [Ai | Gisyepas 


> nin | oP [Ai | €1,...,€j-1,0! es: [Ai | onsen} ‘ 
i=1 


1=1 


Therefore, if the values of €; are chosen, each one in its turn, so as to minimize the 
value of )7;_, Pr [Aj | €1,..., €;], then the value of this sum cannot increase. Since 
this sum is & at the beginning, it follows that it is at most & at the end. But at the end 
each €, is fixed, and hence the value of this sum is precisely the number of events A; 
that hold at the point (€1,..., €¢), showing that our procedure works. 

Note that the assumptions that the probability space is symmetric and that it has 2! 
points can be relaxed. The procedure above is efficient provided / is not too large (as 
is usually the case in combinatorial examples), and, more importantly, provided the 
conditional probabilities Pr [ A; | €1,...,¢€;| can be computed efficiently for each of 
the events A; and for each possible value of €,,...,€;. This is, indeed, the case in 
the example considered in Proposition 16.1.1. However, there are many interesting 
examples where this is not the case. A trick that can be useful in such cases is the 
introduction of pessimistic estimators, introduced by Raghavan (1988). Consider, 
again, the symmetric probability space with 2! points described above, and the events 
Aj,...,Ag in it. Suppose that for each event A; and for each 0 < 7 < 1 we have a 
function fila, ...,€;) that can be efficiently computed. Assume also that 


i PGsees5€j4, 0) FF lett) 
Pee Gee 5 ; ; 


(16.1) 


and that fi is an upper bound on the conditional probabilities for the event A;; that 
is, 


Fi(er,...,€)) 2 Pr (Ai | €1,--..6] : (16.2) 


Clearly the same inequalities hold for the sums over 7. In this case, if in the begin- 
ning )-;_, f§ < t, and we choose the values of the €; so as to minimize the sum 
Di=1 fi (€1,---,€;) in each step, we get in the end a point (€1, ... ,€:) for which the 
sum )>;_, fi (e1,-..,€) < t. The number of events A; that hold in this point is at 
most t. The functions f; in the argument above are called pessimistic estimators. 
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This enables us to obtain efficient algorithms in some cases where there is no known 
efficient way of computing the required conditional probabilities. The following 
theorem is an example; it is related to some of the results in Chapters 13 and 15. 


Theorem 16.1.2 Let (ai; )Pj=1 be an n by n matrix of reals, where -1 < ajj <1 
for alli, 7. Then one can find, in polynomial time, 4,...,€n © {—1, 1} such that for 
everyi,1<i<n, 
n 
~ €;44;3| < V 2nIn(2n). 


j=l 


Proof. Consider the symmetric probability space on the 2” points corresponding to 
the 2” possible vectors (€1,...,€,) € {—1,1}”". Define 6 = \/2nIn(2n) and let 
A; be the event os €;;j3| > G. We next show that the method of conditional 
probabilities with appropriate pessimistic estimators enables us to find efficiently a 
point of the space in which no event A, holds. 

Define a = 3/n and let G(x) be the function 


et + ee 


G(x) = cosh(ax) = 5 


By comparing the terms of the corresponding Taylor series it is easy to see that, for 
every real x, 


G(x) < ern? /2 


with strict inequality if neither x nor a are 0. It is also simple to check that for every 
real x and y, 


GGG G(x +y)+ G(x -y) 


2 
We can now define the functions ii that will form our pessimistic estimators. For 
each 1 <7 < nand for each €1,...,€) € {—1, 1} we define 


Pp n 
Filers-++1€p) = 2e-°G | D gay | [] Clay). 
j=l 


j=ptl 


Obviously, these functions can be efficiently computed. It remains to check that 
they satisfy the conditions described in equations (16.1) and (16.2), and that the sum 
SF, fg is less than 1. This is proved in the following claims. 


Claim 16.1.3 For every 1 <i < nand every €;,...,€p-1 € {—1, 1}, 


P(E) ---s€p-t —1) + Fillers @p-191) 


Fi eis eps) = 2 


THE METHOD OF CONDITIONAL PROBABILITIES 279 


Proof. Put v = ae €;a;;. By the definition of } and by the properties of G, 
Filey epit ia) = e PF G(y G(aip) II G(a;) 
j=p+h 


oa Gv —Gip) + Gut ap) 
af ap up 
2e 5) | G(aij 
j=ptl 
Flr < 005 pay — 1) + filer, peer pds 1) 
9 3 


completing the proof of the claim. | 


Claim 16.1.4 For every 1 <i < nand every €,...,€p-1 € {-1, 1}, 
frala, Pepe ey eae [Ai | E1,+:- »€p-1| é 
Proof. Define v as in the proof of Claim 16.1.3. Then 
Pr [Ai | €1,-- pet | 


= Pr vt) ejay > B + Pr —v-) jay > 8 


j2p j>p 


= Pr [er+ Eze €;i;) > en] + Pr pene is €54i;) > er | 


< ee 7B | exp a) ejai; +e %e-FE | exp —a S650; 
j2P j2p 
= 2% 8G(y ) [Le aij) Sp (Eig rity aa). 
j2P 
This completes the proof of Claim 16.1.4. | 


To establish the theorem it remains to show that )>"_, f§ < 1. Indeed, by the 
properties of G and by the choice of a and (3: 


Si 2 2 TT ot ai;) <) 2e7 *T]e ie 
i=1 
n 


Yo 2c hee oe = QneX 2/2-a8 = Inere n/2 = 7 
i=l 


Il 


iN 


Moreover, the ust ee is strict unless a;; = O for all 7, 7, whereas the second 
is strict unless ai, = 1 for all z, 7. This completes the proof of the theorem. a 
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16.2 d-WISE INDEPENDENT RANDOM VARIABLES 
IN SMALL SAMPLE SPACES 


The complexity class NC is, roughly speaking, the class of all problems that can be 
solved in time that is polylogarithmic (in the size of the input) using a polynomial 
number of parallel processors. Several models of computation, which are a theoretical 
abstraction of the parallel computer, have been used in considering this class. The 
most common one is the EREW (Exclusive Read, Exclusive Write) PRAM, in which 
different processors are not allowed to read from or write into the same memory cell 
simultaneously. See Karp and Ramachandran (1990) for more details. 

Let n denote the size of the input. There are several simple tasks that can easily 
be performed in NC. For example, it is possible to copy the content of a cell c into 
m = n°) cells in time O(log n), using, say, m processors. To do so, consider a 
complete binary tree with m leaves and associate each of its internal vertices with a 
processor. At first, the processor corresponding to the root of the tree reads from c and 
writes its content in two cells, corresponding to its two children. Next, each of these 
two, in parallel, reads from its cell and writes its content in two cells corresponding 
to its two children. In general, at the ith step all the processors whose distance from 
the root of the tree is i — 1, in parallel, read the content of c previously stored in their 
cells and write it twice. The procedure clearly ends in time O(log m), as claimed. 
[In fact, it can be shown that O(m/ log m) processors suffice for this task but we do 
not try to optimize this number here. ] 

A similar technique can be used for computing the sum of m numbers with m 
processors in time O(log m). We consider the numbers as if they lie on the leaves of 
a complete binary tree with m leaves, and in the 7th step each one of the processors 
whose distance from the leaves is 1 computes, in parallel, the sum of the two numbers 
previously computed by its children. The root will clearly have, in such a way, the 
desired sum in time O(log m). 

Let us now return to the edge-coloring problem of the complete graph K,, discussed 
in Proposition 16.1.1. By the remarks above, the problem of checking if in a given 
edge-coloring there are at most (7)2~° monochromatic copies of Kg is in NC; that 
is, this checking can be done in time (log n)°“) — in fact, in time O(log n) —- using 
n°) processors. Indeed, we can first copy the given coloring (7) times. Then we 
assign a processor for each copy of K4 in Ky, and this processor checks if its copy 
is monochromatic or not (all these checkings can be done in parallel, since we have 
enough copies of the coloring). Finally, we sum the number of processors whose 
copies are monochromatic. Clearly we can complete the work in time O(log n) using 
nO()) parallel processors. 

Thus we can check, in NC, if a given coloring of K,, satisfies the assertion 
of Proposition 16.1.1. Can we find such a coloring deterministically in NC? The 
method described in the previous section does not suffice, as the edges have been 
colored one by one, so the procedure is sequential and requires time Q(n?). However, 
it turns out that in fact we can find, in NC, a coloring with the desired properties by 
applying a method that relies on a technique first suggested by Joffe (1974), and later 
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developed by many researchers. This method is a general technique for converting 
randomized algorithms whose analysis only depends on d-wise rather than fully 
independent random choices (for some constant d) into deterministic (and in many 
cases also parallel) ones. Our approach here follows the one of Alon, Babai and Itai 
(1986), but for simplicity we only consider here the case of random variables that 
take the two values 0, 1 with equal probability. 

The basic idea is to replace an exponentially large sample space by one of poly- 
nomial size. If a random variable on such a space takes a certain value with positive 
probability, then we can find a point in the sample space in which this happens simply 
by deterministically checking all the points. This can be done with no loss of time 
by using a polynomial number of parallel processors. Note that for the edge-coloring 
problem considered in Proposition 16.1.1, 6-wise independence of the random vari- 
ables corresponding to the colors of the edges suffice, since this already gives a 
probability of 2—° for each copy of K4 to be monochromatic, and hence gives the 
required expected value of monochromatic copies. Therefore, for this specific exam- 
ple, it suffices to construct a sample space of size nO) and 4) random variables in 
it, each taking the values 0 and 1 with probability 5, such that each 6 of the random 
variables are independent. 

Small sample spaces with many d-wise independent 0,1 random variables in 
them can be constructed from any linear error correcting code with appropriate 
parameters. The construction we describe here is based on the binary BCH codes 
[see, e.g., MacWilliams and Sloane (1977)]. 


Theorem 16.2.1 Suppose n = 2 —1 and d = 2t+1. Then there exists a symmetric 
probability space Q of size 2(n + 1)' and d-wise independent random variables 
Y1s+++:Yn over Q each of which takes the values 0 and | with probability 5. 

The space and the variables are explicitly constructed, given a representation of 
the field F = GF(2*) as a k-dimensional algebra over GF (2). 


Proof. Let 7;,...,%» be the n nonzero elements of F’, represented as column vectors 
of length k over GF'(2). Let H be the following 1 + kt by n matrix over GF (2). 


1 1 | 
XY oD) In 
3 3 3 
H= Ly Ly Ln 
2t—-1 2t—1 2t—1 
ry no) Ln 


This is the parity check matrix of the extended binary BCH code of length n and 
designed distance 2¢ + 2. It is well known that any d = 2¢ + 1 columns of H are 
linearly independent over GF'(2). For completeness, we present the proof in the next 
lemma. 


Lemma 16.2.2 Any set of d = 2t + 1 columns of H is linearly independent over 
GF (2). 
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Proof. Let J Cc {1,2,...,n} be a subset of cardinality |.J| = 2¢+ 1 of the set of 
indices of the columns of H. Suppose that }> jes %jH; = 0, where H; denotes the 
jth column of H and z; € GF(2). To complete the proof we must show that z; = 0 
for all 7 € J. By the assumption, 


S > 2525 =0 (16.3) 


jet 


for 2 = O and for every odd 7 satisfying 1 < 71 < 2t— 1. Suppose, now, that 
a = 2.1, where 1 < 2t — 1 is an odd number. By squaring the equation (16.3) b 
times, where i = 1, using the fact that (u + v)? = u2 +v? (mod 2) and the fact 
that since each z; is either 0 or 1, the equality z; = 23 holds for all 7, we conclude 
that equation (16.3) holds for 2 = a. Consequently, (16.3) holds for all 7,0 <7 < 2t. 
This is a homogeneous system of 2¢ + 1 linear equations in 2¢ + 1 variables. The 
matrix of the coefficients is a Vandermonde matrix, which is nonsingular. Thus the 
only solution is the trivial one z; = 0 for all 7 € J, completing the proof of the 
lemma. a 


Returning to the proof of the theorem, we define 2 = {1,2,...,2(n + 1)*}, and 
let A = (a;;),i € 2,1 < 7 < nbe the (0, 1)-matrix whose 2(n + 1)* = 2*'*1 rows 
are all the linear combinations (over GF(2)) of the rows of H. The sample space 2 
is now endowed with the uniform probability measure, and the random variable y; is 
defined by the formula y;(z) = a,; for alli € Q,1 <j <n. 

It remains to show that the variables y; are d-wise independent, and that each 
of them takes the values 0 and 1 with equal probability. For this we have to show 
that for every set J of up to d columns of A, the rows of the |Q| by |./| submatrix 
Aj = (aij), i € 2,7 € J take on each of the 2!7! (0, 1)-vectors of length |.J| equally 
often. However, by Lemma 16.2.2 the columns of the corresponding submatrix H ; 
of H are linearly independent. The number of rows of A, that are equal to any 
given vector is precisely the number of linear combinations of the rows of Hy that 
are equal to this vector. This number is the number of solutions of a system of 
|J| linearly independent linear equations in kt + 1 variables, which is, of course, 
2*t+1—lJ1, independent of the vector of free coefficients. This completes the proof 
of the theorem. a 


Theorem 16.2.1 supplies an efficient way of constructing, for every fixed d and 
every n, asample space of size O(n!4/2!) and n d-wise independent random variables 
in it, each taking the values 0 and 1 with equal probability. In particular, we can use 
such a space of size O( (5) = O(n®) for finding a coloring as in Proposition 16.1.1 
in NC. Several other applications of Theorem 16.2.1 appear in the paper of Alon et 
al. (1986). 

It is natural to ask if the size O(n!4/2!) can be improved. We next show that this 
size is optimal, up to a constant factor (depending on d). 
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Let us call a random variable almost constant if it attains a single value with 
probability 1. Let m(n, d) denote the function defined by 


d/2 


m(n, d) = os ) if d is even 
Soe 
j 
and 
(d-1)/2 a ee 
min, d) = .)+ if d is odd. 
d= 2 (3) + Canny) 


Observe that for every fixed d, m(n,d) = Q(nl4/2J), 


Proposition 16.2.3 If the random variables yy, ..., Yn over the sample space Q are 
d-wise independent and none of them is almost constant then |2| > m(n, d). 


Note that we assume here neither that Q is a symmetric space nor that the variables 
y; are (0, 1)-variables. 


Proof. Clearly we may assume that the expected value of each y; is 0 [since 
otherwise we can replace y; by y; — E [y;].] For each subset S of {1,...,n}, define 
as = |[j;<5 yj. Observe that since no y; is almost constant and since the variables 
are d-wise independent, 


Ejagas| =|] Var [y5] (16.4) 
jES 


for all S satisfying |S| < d. Similarly, for all S and T satisfying |S UT| < d and 
S #T we have 


Elasar}= [[ Var(y| [J  Elyj=o. (16.5) 
GESNT JESUT\(SNT) 
Let S),...,5m, where m = m(n, d), be subsets of {1,...,n} such that the union 


of each two is of size at most d. [Take all subsets of size at most d/2, and if d is odd 
add all the subsets of size (d + 1)/2 containing 1.] 

To complete the proof, we show that the m functions ag, (considered as real 
vectors of length |{2|) are linearly independent. This implies that |Q| > m = m(n,d), 
as stated in the proposition. 

To prove linear independence, suppose ae cjag, = 0. Multiplying by as, and 
computing expected values we obtain, by (16.5), 


m 
0= S° GE las, as, | = GE [as,as,| ‘ 
j=l 


This implies, by (16.4), that c; = 0 for all 7. The required linear independence 
follows, completing the proof. re 
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The last proposition shows that the size of a sample space with n d-wise inde- 
pendent nontrivial random variables can be polynomial in n only when d is fixed. 
However, as shown by Naor and Naor (1990), if we only require the random variables 
to be almost d-wise independent, the size can be polynomial even when d = Q(log 7). 
Such sample spaces and random variables, which can be constructed explicitly in sev- 
eral ways, have various interesting applications in which almost d-wise independence 
suffices. More details appear in Naor and Naor (1990) and in Alon, Goldreich, Hastad 
and Peralta (1990). 


16.3 EXERCISES 


1. Let Ai,...,An © {1,...,m} with 377, 2!-!4:! < 1. Prove there exists 
a two-coloring x : {1,...,m} — {0,1} with no A; monochromatic. With 
m =n give a deterministic algorithm to find such a x in polynomial time. 


2. Describe a deterministic algorithm that, given n, constructs, in time polynomial 
inn, afamily F of n!° subsets of the set N = {1,2,...,2}, whereeach F € F 
is of size at most 10 log n and for every family G of n subsets each of cardinality 
n/2 of N, there is an F € F that intersects all members of G. 


THE PROBABILISTIC LENS: 


Crossing Numbers, 
Incidences, Sums 
and Products 


In this lens we start with a simple result in graph theory, whose proof is probabilistic, 
and then describe some of its fascinating consequences in combinatorial geometry and 
combinatorial number theory. Some versions of most of these seemingly unrelated 
consequences have been proved before, in a far more complicated manner. Before 
the discovery of the new proofs shown here, the only clue that there might be a 
connection between all of them has been the fact that Endre Szemerédi is one of the 
coauthors of each of the papers providing the first proofs. 

An embedding of a graph G = (V, E) in the plane is a planar representation of 
it, where each vertex is represented by a point in the plane, and each edge uv is 
represented by a curve connecting the points corresponding to the vertices u and v. 
The crossing number of such an embedding is the number of pairs of intersecting 
curves that correspond to pairs of edges with no common endpoints. The crossing 
number cr(G) of G is the minimum possible crossing number in an embedding of 
it in the plane. The following theorem was proved by Ajtai, Chvatal, Newborn and 
Szemerédi (1982) and, independently, by Leighton. Here we describe a very short 
probabilistic proof. 


Theorem 1 The crossing number of any simple graph G = (V, E) with |E| > 4|V| 
is at least |E|°/64|V |?. 

Proof. By Euler’s formula any simple planar graph with n > 3 vertices has at most 
3n — 6 edges, implying that any simple planar graph with n vertices has at most 3n 
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edges. Therefore the crossing number of any simple graph with n vertices and m 
edges is at least m — 3n. Let G = (V, E) be a graph with |E| > 4|V| embedded in 
the plane with t = cr(G) crossings. Let H be the random induced subgraph of G 
obtained by picking each vertex of G, randomly and independently, to be a vertex of 
HT with probability p (where p will be chosen later). The expected number of vertices 
of H is p|V|, the expected number of its edges is p?|E|, and the expected number 
of crossings in its given embedding is p*t, implying that the expected value of its 
crossing number is at most p*t. Therefore p*t > p?|E| — 3p|V|, implying that 
cr(G) =t> a 

Without trying to optimize the constant factor, substitute p = 4|V|/|E| (< 1), to get 
the desired result. a 


Székely (1997) noticed that this result can be applied to obtain a surprisingly 
simple proof of a result of Szemerédi and Trotter in combinatorial geometry. The 
original proof is far more complicated. 


Theorem 2 Let P be a set of n distinct points in the plane, and let L be a set of 
m. distinct lines. Then the number of incidences between the members of P and 
those of L (i.e., the number of pairs (p,l) with p © P, | € Land p € l) is at most 
c (m2/3n2/3 +mt+ n), for some absolute constant c. 


Proof. We may and will assume that every line in D is incident with at least one of 
the points of P. Denote the number of incidences by I. Let G = (V, E) be the graph 
whose vertices are all members of P, where two are adjacent if and only if they are 
consecutive points of P on some line in L. Clearly |V| = n and |E| = I ~ m. Note 
that G is already given embedded in the plane, where the edges are represented by 
segments of the corresponding lines in L. In this embedding, every crossing is an 
intersection point of two members of L, implying that cr(G) < (J) < $m?. By 
Theorem 1, either J — m = |E| < 4|V| = 4n, that is, ! < m+ 4n, or 
m? (I —m)§ 
5 2 er(G) > ig 

implying that I < (32)!/9m?/3n?/3 +m. In both cases I < 4 (m?/$n?/3 + m+n), 
completing the proof. a 


An analogous argument shows that the maximum possible number of incidences 
between a set of mn points and a set of m unit circles in the plane does not exceed 
O (m?/3n?/3 + m +n) and this implies that the number of unit distances determined 
by a set of n points in the plane is at most O (n*/?). While the above upper bound 
for the number of incidences of points and lines is sharp, up to a constant factor, an 
old conjecture of Erdés asserts that the maximum possible number of unit distances 
determined by a set of n points in the plane is at most c.n!*+* for any « > 0. The 
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O (n‘/3) estimate is, however, the best known upper bound and was first proved by 
Spencer, Szemerédi and Trotter in a far more complicated way. 

Elekes (1997) found several applications of Theorem 2 in additive number theory. 
Here, too, the proofs are amazingly simple. Here is a representative result. 


Theorem 3 For any three sets A,B and C of s real numbers each, 


|A-B+C|=[{ab+e:a¢ Ade Bee Ch] >2(s%?) . 


Proof. Put R = A- B+ C,|R| =r and define 
P={(a,t):a€A,te R}, L={y=br+c:b€ B,cEC}. 


Thus P is a set of n = sr points in the plane, L is a set of m = s? lines in the plane, 
and each line y = bx + cin Lis incident with s points of P, that is, with all the points 
{(a,ab +c): a € A}. Therefore, by Theorem 2, s? < 4 (s*/9(sr)?/9 + sr + 8?), 
implying that r > © (s*/?), as needed. | 


The same method implies that for every set A of n reals, either |A+.A] > 2(n°/4) 
or |A- A] > n°/4, greatly improving and simplifying a result of Erdés and Szemerédi. 
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17 


Graph Property Testing 


’Call the first witness,’ said the King; and the White Rabbit blew three blasts on 
the trumpet, and called out, ’First witness!” 


— from Alice in Wonderland, by Lewis Carroll 


17.1 PROPERTY TESTING 


Property testers are fast randomized algorithms for distinguishing between combina- 
torial structures that satisfy a certain property, and ones that are far from satisfying it. 
The basic algorithmic task in this area is to design a randomized algorithm that, given 
a combinatorial structure S, can distinguish with high probability between the case 
that S satisfies a prescribed property P and the case that S is ¢-far from satisfying P. 
Here S is said to be e-far from satisfying P if an e-fraction of its representation should 
be modified in order to turn it to a structure that satisfies ?. The main objective is 
to design randomized algorithms, which look at a very small portion of the input, 
and using this information distinguish with high probability between the above two 
cases. Such algorithms are called testers for the property P. 

Preferably, a tester should look at a portion of the input whose size is a function of 
€ only. The general notion of property testing was first formulated by Rubinfeld and 
Sudan (1996), who were motivated by the study of various algebraic properties such 
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as linearity of functions. Property testing is also motivated by questions in program 
checking, computational learning, approximation algorithms and probabilistically 
checkable proofs, as well as by the need to access large data sets, like the graph of 
the Internet. The investigation of the subject relies heavily on probabilistic methods. 

The main focus of this chapter is in testing properties of graphs. In this case a 
graph G on n vertices is said to be e-far from satisfying a property P, if one needs to 
add to or delete from G at least en? edges in order to turn it into a graph satisfying 
P. Here we assume that the tester can query an oracle whether a pair of vertices, i 
and j, are adjacent in the input graph G. If the graph satisfies the property, then the 
tester has to accept with probability at least, say, 2, and if it is e-far from satisfying 
it, then the algorithm has to reject with probability at least 2. 

The study of the notion of testability for combinatorial structures, and mainly for 
labeled graphs, was introduced by Goldreich, Goldwasser and Ron (1998). They 
showed that many natural graph properties such as k-colorability, having a large 
clique or having a large cut, admit a tester, whose query complexity [i.e., the number 
of oracle queries of type “does (i, 7) belong to £(G)”] as well as their total running 
time can be upper bounded by a function of ¢ that is independent of the size of the 
input. We call properties having such efficient testers, that is, testers whose query 
complexity is a function of ¢ only, testable. In general, a property tester may have a 
small probability of accepting graphs that are e-far from satisfying the tested property, 
as well as a small probability of rejecting graphs satisfying the property. In this case 
the tester is said to have two-sided error. If the tester accepts graphs satisfying the 
property with probability 1, then the tester is said to have one-sided error. 

It is worth noting that the model of graph property testing described here is often 
referred to as the dense graph model. Other models of graph property testing have also 
been investigated; see, for example, Goldreich and Ron (2002). For further reading 
and pointers on testing properties of graphs and other combinatorial structures the 
reader is referred to the surveys by Goldreich (1999), Fischer (2001), Ron (2001), 
Alon and Shapira (2006) and their references. 


17.2 TESTING COLORABILITY 


Although the computational problem of deciding whether a given graph is k-colorable 
is NP-complete for every fixed & > 3, it turns out that, somewhat surprisingly, for 
every fixed « > O there is an efficient algorithm for distinguishing between graphs 
on 7 vertices that are k-colorable, and graphs from which one has to delete at least 
en? edges to make them k-colorable. This result, mentioned already in Alon, Duke, 
Lefmann, Rédl and Yuster (1994), follows from the fact that the property of being 
k-colorable is testable, as proved implicitly in R6dl and Duke (1985) and explicitly 
(with a far better dependence on the parameter ¢) in Goldreich et al. (1998). Indeed, 
as we show in this subsection, if a graph G = (V, E) is e-far from being k-colorable, 
then an induced subgraph of it on a randomly chosen set of c(k)/e? vertices is not 
k-colorable with high probability. This is proved in Alon and Krivelevich (2002), 
with c(k) = 36k lnk, building on the work of Goldreich et al. (1998), who showed 
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that a random set of O(k? Ink/e*) vertices suffices. Note that the above supplies 
a very simple tester with one-sided error for testing k-colorability; consider the 
induced subgraph on a randomly chosen set of 36k 1n k/e? vertices, and accept iff 
this subgraph is k-colorable. Obviously, every k-colorable graph is accepted by this 
procedure, and graphs that are e-far from being k-colorable are likely to be rejected. 
Note also that the validity of this statement implies the nontrivial fact that every 
graph that is e-far from being k-colorable contains a small witness (for being non-k- 
colorable), that is, a subgraph on only c(e,k) < O(kInk/e?) vertices which is not 
k-colorable. The existence of some such function c(¢, &) has been conjectured by 
Erdos and first proved by Rédi and Duke [for some extremely fast growing function 
c(é,k) of e and k — see Rédl and Duke (1985).] In this section we describe the 
improved c(k)/e? bound. For simplicity, we present the proof only for k = 3; the 
proof for the general case is essentially identical. Throughout the proof we omit all 
floor and ceiling signs whenever these are not crucial. 


Theorem 17.2.1 Suppose 0 < é< i let G = (V,E) be a graph on n > 400/e? 
vertices, and suppose that one has to delete from G at least en® edges to make it 
3-colorable. Then the probability that an induced subgraph of G on a randomly 
chosen set of s = 40/¢? vertices is 3-colorable does not exceed a 


We start with an outline of the proof. Given G = (V, F) as in the theorem, pick 
a random subset R C V of size |R| = s = 40/e? in s rounds, each time choosing 
uniformly at random a single vertex r; among the vertices not selected so far. 

Suppose that some subset S C F has already been 3-colored by 6: S —> C, 
where C = {1,2,3}. The objective is to show that with high probability there is a 
witness showing that this partial coloring cannot be extended to a proper coloring of 
the induced subgraph on R. If a proper 3-coloring c : V — C of G is to coincide 
with ¢ on S, then for every vertex v € V \ S, the colors of the neighbors of v in S 
under ¢ are forbidden for v in c. The rest of the colors are still feasible for v. It could 
be that v has no feasible colors left at all. Such a vertex will be called colorless with 
respect to S and ¢. If the number of colorless vertices is large, then there is a decent 
chance that among the next few randomly chosen vertices of R there will be one such 
colorless vertex v*. Obviously, adding v* to S provides the desired witness for non 
extendibility of ¢. 

If the set of colorless vertices is small, then one can show that, as G is e-far from 
being 3-colorable, there is a relatively large subset W of vertices (which will be called 
restricting) such that adding any vertex v € W to S and coloring it by any feasible 
color excludes this color from the lists of feasible colors of at least en neighbors of v. 
If such a vertex v is found among the next few vertices of the random sample R, then 
adding v to S and coloring it by any of its feasible colors reduces substantially the 
total size of the lists of feasible colors for the remaining vertices of V, which helps to 
approach the first situation, that is, the case when there are many colorless vertices. 
This process can be represented by a tree in which every internal node corresponds 
to a restricting vertex v, and every edge from v to a child corresponds to a feasible 
color for v. The tree will not be very large. Indeed, each of its internal vertices has 
at most 3 children, and its depth cannot exceed 3/c, as the total size of the lists of 
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feasible colors at the beginning is 3n, and this size is reduced by at least en in each 
step. It thus suffices to show that with high probability the construction of the whole 
tree (until no feasible colors are left to any of its leaves) can be completed using the 
vertices in our random set R. 

We proceed with the formal proof. For a subset S C V, a 3-coloring of it 
@: S — C, and a vertex v € V \ S let Lg(v) be the set of all colors in C’ besides 
those that appear already on some neighbor of v. This is the set of feasible colors for 
v. Clearly, for S = 0, Lg(v) = C for every v € V. A vertex v € V \ S is called 
colorless if Ly(v) = 0. Let U denote the set of all colorless vertices under (5, ¢). 

For every vertex v € V \ (S UU) define 


dg(v) = a {ue N(v)\(SUU):i€ L(u)}l. 


Therefore coloring v by any one of the colors from L¢(v) and then adding it to $ 
will result in deleting this color and thus shortening the lists of feasible colors of at 
least dg(v) neighbors of v outside S. 


Claim 17.2.2 For every set S C V and every 3-coloring of S, the graph G is at 
most 


(n~1)|SUU|+ ; S>  ba(v) 


vEV\(SUU) 


edges far from being k-colorable. 


Proof. Consider the following coloring of G: Every v € S is colored by $(v), every 
v € U is colored by an arbitrary color and every v € V \ (S UU) is colored by 
a color i € Lg{v) for which dg(v) = |{u € N(v) \(SUU) : 7 € Lg(u)}|. The 
number of monochromatic edges incident with S U U is at most (n — 1)|S UU]. 
Every vertex vu € V \ (SMU) has exactly 54(v) neighbors u € V \ (S UU), whose 
color list L4(v) contains the color chosen for v. Therefore v will have at most 64(v) 
neighbors in V \ (S'UU) colored in the same color as v itself. Hence the total number 
of monochromatic edges is at most 


(n-DISUUL+5 SY 490), 


vEV\(SUU) 


as claimed. & 


Given a pair (S,¢), a vertex v € V \ (S UU) is called restricting if 6g(v) > en. 
We denote by W the set of all restricting vertices. 


Claim 17.2.3 For every pair (S,@), where S C V ando:S > C, 


|UUSUW| > en/2. 
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Proof. By the previous claim, and since G is ¢-far from being 3-colorable, 


en? < n(lS|+I0)+5 So 690) 


veV\(SUU) 
1 1 
< n(\$|+|Ul) + sIWin-H +5 ee 
< n(\S|-+|U| + |W) + sen? 
a 
Returning to our randomly chosen vertices r1,..., 7s of R, construct an auxiliary 


ternary tree 7’. To distinguish between the vertices of G and those of T we call the 
latter nodes. Each node of T is labeled either by a vertex of G or by the special 
symbol #, whose meaning will be explained in what follows. If a node t of T is 
labeled by #, then t is called a terminal node. The edges of T are labeled by integers 
from C. 

Let t be a node of J’. Consider the path from the root of T to t, not including t 
itself. The labels of the nodes along this path form a subset S(t) of V. The labels of 
the edges along the path define a 3-coloring ¢(t) of S(t) in a natural way: the label 
of the edge following a node ¢’ in the path determines the color of its label v(t’). The 
labeling of the nodes and edges of T will have the following property: If t is labeled 
by v and v has a neighbor in S(t) whose color in ¢(t) is z, then the son of v along 
the edge labeled by 2 is labeled by #. This label indicates the fact that in this case 
color i is infeasible for v, given (S(t), @(t)). 

At each step of the construction of T’ we will maintain the following: All leafs of 
T are either unlabeled or are labeled by #. Also, only leafs of J’ can be labeled by 
#. We start the construction of T from an unlabeled single node, the root of T. 

Suppose that 7 — 1 vertices of T’ have already been chosen, and we are about to 
choose vertex r; of R. Consider a leaf t of T. If t is labeled by #, we do nothing for 
this leaf. (That is the reason such a t is called a terminal node; nothing will ever grow 
out of it.) Assume now that ¢ is unlabeled. Define the pair (S(t), o(t)) as described 
above. Now, for the pair (S(t), @(t)) we define the set U(t) of colorless vertices and 
the set W(t) of restricting vertices as described before. Round j is called successful 
for the node ¢ if the random vertex 1; satisfies r; € U(t) UW (t). If round j is indeed 
successful for ¢, then we label ¢ by r;, create 3 sons of ¢ and label the corresponding 
edges by 1, 2,3. Now, if color ¢ is infeasible for r;, given (S(t), d(t)), we label the 
son of ¢t along the edge with label 7 by #, otherwise we leave this son unlabeled. 
Note that ifr; € U(t), then none of the colors from Cis feasible for r;, and thus all 
the sons of ¢ will be labeled by #. This completes the description of the process of 
constructing 7’. As each edge along a path from a root to a leaf of the tree corresponds 
to a restricting vertex, and the total size of all lists starts with 3n and is reduced by at 
least en with each coloring of a restricting vertex, we have the following. 


Claim 17.2.4 The depth of T is at most 3/e. 
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Our construction also implies that if a leaf t* of T is labeled by #, then ¢(¢*) is 
not a proper 3-coloring of S(t*). We thus have the following. 


Claim 17.2.5 If after round j all leafs of the tree T are terminal nodes, then the 
induced subgraph of G on {r1,...,1;} is not 3-colorable. 


To complete the proof it thus suffices to show the following. 


Claim 17.2.6 After s = 40/e? rounds, with probability at least 3 all leaves of T 
are terminal nodes. 


Proof. As every non leaf node of T has at most 3 sons and by Claim 17.2.4 the depth 
of T is at most 3/e, it can be embedded naturally in the ternary tree T3,3/- of depth 
3/e. Moreover, this embedding can be prefixed even before exposing R and T. Note 
that the number of vertices of T3.3/. is 1+ 3+---+ Be ee eee 

Recall that during the construction of the random sample R and the tree T, a 
successful round for a leaf t of T results in creating 3 sons of T. Fix a node t of 
T3,3/e- If after 40/e? rounds t is a leaf of 7, then the total number of successful 
rounds for the path from the root of T to t is equal to the depth of t. As S(t) C R 
and thus |,S(t)| < 40/e? < en/10, by Claim 17.2.3 each round has probability of 
success at least 0.4¢. Therefore, the probability that ¢ is a non terminal leaf of T after 
40/e? steps can be bounded from above by the probability that the binomial random 
variable B(40/e?, 0.4e) is at most 3/e. The latter probability is at most 


(16/e —3/e)?\ _ 169 
exp ( 6s po Bey 
Thus by the union bound we conclude that the probability that some node of T(3, 3/€) 
is a leaf of T’, not labeled by #, is at most 


169 
\V(T3.3/)| exp (-5) < 10° 


The assertion of Theorem 17.2.1 follows from Claims 17.2.5 and 17.2.6. 


17.3. SZEMEREDI’S REGULARITY LEMMA 


In this section we describe a fundamental result, the Regularity Lemma, proved by En- 
dre Szemerédi in the 1970s. The original motivation for proving it was an application 
in combinatorial number theory, leading, together with several additional deep ideas, 
to a complete solution of the Erdos—Turdn conjecture discussed in Appendix B.2: 
Every set of integers of positive upper density contains arbitrarily long arithmetic 
progressions. It took some time to realize that the lemma is an extremely powerful 
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tool in extremal graph theory, combinatorics and theoretical computer science. Stated 
informally, the Regularity Lemma asserts that the vertices of every large graph can 
be decomposed into a finite number of parts, so that the edges between almost every 
pair of parts form a random-looking graph. The power of the lemma is in the fact 
it deals with an arbitrary graph, making no assumptions, and yet it supplies much 
useful information about its structure. It should be stressed that the impact of the 
Regularity Lemma goes far beyond its applications in property testing, which is our 
focus in this chapter. A detailed survey of the lemma and some of its many variants 
and fascinating consequences can be found in Komlés and Simonovits (1996). 

Let G = (V,E) be a graph. For two disjoint nonempty subsets of vertices 
A, B C V, let e(A, B) denote the number of edges of G with one end in A and one 
in B, and let (A, B) 

e ? 
HAP) TTB 

denote the density of the pair (A, B). For a real ¢ > 0, a pair (A, B) as above is 
called ¢-regular if for every X C Aand Y Cc B that satisfy |X| > <|Al, |Y| > ¢|B| 
the inequality |d( A, B) — d(X,Y)| < e holds. It is not difficult to see that for every 
fixed positive €, p a fixed pair of two sufficiently large disjoint subsets A and B of a 
random graph G' = G(n, p) are very likely to be €-regular of density roughly p. (This 
is stated in one of the exercises at the end of the chapter.) Conversely, an ¢-regular 
pair A, B with a sufficiently small positive ¢ is random-looking in the sense that it 
shares many properties satisfied by random (bipartite) graphs. 

A partition V = Vo UV, U--- UY; of V into pairwise disjoint sets in which Vo 
is called the exceptional set is an equipartition if |Vi| = |V2| = +--+ = |Vi.|. We view 
the exceptional set as |Vg| distinct parts, each consisting of a single vertex. For two 
partitions P and P’ as above, P’ is a refinement of P, if every part in P is a union 
of some of the parts of P’. By the last comment on the exceptional set this means, 
in particular, that if P’ is obtained from P by shifting vertices from the other sets in 
the partition to the exceptional set, then P’ is a refinement of P. An equipartition is 
called ¢-regular if |Vo| < e|V| and all pairs (V;, V;) with 1 <i <j < k, except at 
most ¢k? of them, are ¢-regular. 


Theorem 17.3.1 [The Regularity Lemma, Szemerédi (1978)] For every « > O and 
every integer t there exists an integer T = T(e,t) so that every graph with at least T 
vertices has an €-regular partition (Vo, V,,..., Ve), wheret <k <T. 


The basic idea in the proof is simple. Start with an arbitrary partition of the set of 
vertices into t disjoint classes of equal sizes (with a few vertices in the exceptional 
set, if needed, to ensure divisibility by t). Proceed by showing that as long as the 
existing partition is not ¢-regular, it can be refined in a way that increases the weighted 
average of the square of the density between a pair of classes of the partition by at 
least a constant depending only on e. As this average cannot exceed 1, the process 
has to terminate after a bounded number of refinement steps. Since in each step we 
control the growth in the number of parts as well as the number of extra vertices 
thrown to the exceptional set, the desired result follows. The precise details require 
some care and are given in what follows. 
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Let G = (V,£E) be a graph on |V| = n vertices. For two disjoint subsets 
U,W c V, define g(U, W) = (\U||W|/n2)d?(U, W). For partitions U of U and W 
of W, define 

qU,W) = Y) 4(U',W’'). 


u'cu 
wlew 


Finally, for a partition P of V, with an exceptional set Vo, define ¢(P) = >> q(U, W), 
where the sum ranges over all unordered pairs of distinct parts U, W in the partition, 
with each vertex of the exceptional set Vp forming a singleton part in its own. 
Therefore g(P) is a sum of (oh terms of the form q(U,W). The quantity q(P) 
is called the index of the partition P. Since d?(U, W) < 1 for all U, W, and since the 
sum )~ |U||W| over all unordered pairs of distinct parts U, W is at most the number 
of unordered pairs of vertices, it follows that the index of any partition is smaller 
than 5. 


Lemma 17.3.2 


(i) Let U, W be disjoint nonempty subsets of V; let U be a partition of U and W 
a partition of W. Then q(U, W) > q(U, W). 


(ii) If P’ and P are partitions of V and P’ is a refinement of P, then q(P’) > q(P). 
(iii) Suppose € > 0, and suppose U, W are disjoint nonempty subsets of V and the 


pair (U, W) is not €-regular. Then there are partitions U = {U;, U2) of U and 
W = (Wi, W2) of W so that q(U,W) > q(U,W) + e4*|U||W|/n?. 


Proof. 


(i) Define a random variable Z as follows. Let u be a uniformly chosen random 
element of U, and let w be a uniformly chosen random element of W. Let 
U’ € U and W’ © W be those members of the partition so that u € U’, 
we W’. Then Z = d(U',W’). 


The expectation of Z is 


OWL a ay ww] U',W") 
sa hat wt) = ; = d(U,W). 
2 Ta = ai Jom = HO) 


whew Wlew 


By Jensen’s Inequality, E [Z Z| > (E[Z])° and the desired result follows, as 


n? 2 
E [27] = yt) and (E[Z])” = d?(U, W) = Taare W). 


(ii) This is an immediate consequence of (i). 


(iii) Since the pair (U, W) is not ¢-regular, there are subsets U) C U,W, C W 
so that |U;| > e/U|, |Wi| > e|W| and |d(U,,W,) — d(U,W)| > ©. Put 
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Uz = U\ Ui, W. = W \ Wy and define the partitions ¢ = {U,,U>}, 
W = {W,, W2}. Let Z be the random variable defined in the proof of part (i). 
Then, as shown in that proof, 


Var [Z] = E[Z?] — (E[Z])° (q(U,W) — q(U,W)) . 


|U||W| 
However, as E[Z] = d(U, W) it follows that Z deviates from E[Z] by more 
than € with probability |U,||W1|/|U||W], implying that 


|Ui||W4 
Var [Z] > ww * Se 


This provides the desired result. a 


Proposition 17.3.3 Suppose 0 < ¢ < i, let P = {Vo,Vi,..., Ve} be an equiparti- 
tion of V, where Vo is the exceptional set, |Vo| < en, and |V;| = cforalll1 <i<k. 
If P is not e-regular then there exists a refinement P' = {Vj,Vi,...,V/} of P, in 
which k < £ < k4*, |VJ| < |Vo| + n/2*, all other sets V; are of the same size and 
QP) Sa P) ge 


Proof. For every pair 1 <i < j < k define a partition V;; of V; and V;; of V; as 
follows. If the pair (V;, V;) is e-regular, then the two partitions are trivial. Otherwise, 
each partition consists of two parts, chosen according to Lemma 17.3.2, part (iii). 
For each 1 <i < k, let V; be the partition of V; obtained by the Venn diagram of all 
(k — 1)-partitions V;;. Thus each V; has at most 2k—1 parts. Let Q be the partition of 
V consisting of all parts of the partitions V; together with the original exceptional set 
Vo. By Lemma 17.3.2, parts (11) and (iii), and since P is not e-regular, we conclude 
that the index of Q satisfies 


(kc)? 6° 
: We > q(P) + 2 , 


2 
q(Q) > a(P) + chet =q(P)+e 


where here we used the fact that kc > (1 — e)n > Sn. Note that Q has at most 
k2*—! parts (besides the exceptional set), but those are not necessarily of equal sizes. 
Define b = |c/4*| and split every part of Q arbitrarily into disjoint sets of size b, 
throwing the remaining vertices in each part, if any, to the exceptional set. This 
process creates a partition P’ with at most k4* non exceptional parts of equal size, 
and a new exceptional set Vj of size smaller than |Vo| + k2*-'b < |Vo| + ke/2® < 
\Vo| + n/2*. Moreover, by Lemma 17.3.2, part (ii), the index g(P’) of P’ is at least 
q(Q) > q(P) + 5e°, completing the proof. a 


Proof [Theorem 17.3.1]. It suffices to prove the lemma for « < } and ¢ satisfying 
2'-? > 1/e%; hence we assume that these inequalities hold. Put s = [1/e°] and 
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note that for this choice 179° < ¢/2s for all k > t. Define ko = t and kj41 = k,4* 
for all 1 > 0. We prove the lemma with T = kg. 

Let G = (V, FE) be a graph with |V| = n > T vertices. Start with an arbitrary 
partition P = Po of its vertices into k = kg = t pairwise disjoint parts, each of size 
|n/t|, and let the exceptional set consist of the remaining vertices, if any. Note that 
their number is less than t, which is (much) smaller than Sen. As long as the partition 
P we have already defined is not ¢-regular, apply Proposition 17.3.3 to refine it to a 
new equipartition P’ with at most k4* non exceptional parts, whose index exceeds 
that of P by at least 6° , while the size of the exceptional set increases by at most 
n/2" < en/2s. As the initial index is nonnegative, and the index never exceeds s, 
the process must terminate in at most s steps, yielding an ¢-regular partition with at 
most 7’ non exceptional parts, and an exceptional set of size smaller than en. a 


Remark. The proof shows that Te, [1/e]|) is bounded by a tower of exponents 
of height roughly 1/e°. Surprisingly, as shown by Gowers (1997), this tower-type 
behavior is indeed necessary. 


17.4 TESTING TRIANGLE-FREENESS 


The relevance of the Regularity Lemma to property testing is nicely illustrated in the 
proof that the property of containing no triangle is testable with one-sided error. The 
required combinatorial lemma here is the (intuitive, yet nontrivial) fact that if one 
has to delete at least en? edges of an n-vertex graph to destroy all triangles in it, then 
the graph must contain at least 5n° triangles, where 6 = 5(€) > 0. As shown in the 
exercises, following Ruzsa and Szemerédi (1978), this fact implies that any set of 
integers with positive upper density contains a three-term arithmetic progression. 


Lemma 17.4.1 For any positive e < 1 there isa 6 = 5(€) > 0 so that ifG = (V, E) 
is a graph on |V| = n vertices that is e-far from being triangle-free, then it contains 
at least 6n° triangles. 


Proof. We prove the lemma with 
<3 


°= 99T (C/A, [A/e1)’ 


where T is as in Theorem 17.3.1. Let G = (V, E) satisfy the assumption. Note, first, 
that if n < T(¢/4, [4/e]|) then the assertion is trivial, as in this case 67° is less than 
1, and it is trivial that if G is e-far from being triangle-free then it contains a triangle. 
We thus assume that n is at least T(e, t), where t = [4/e}. By Theorem 17.3.1 there 
is an (¢/4)-regular partition (Vo, Vi,..., V.) of G, wheret <k < T =T(e,t). Put 
c= |V\| = |V2| =--- = |V;|. Let G’ be the graph obtained from G' by deleting the 
following edges: 
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All edges of G that are incident with a vertex of the exceptional set Vo (there 
are less than ten? such edges.) 


All edges of G that lie inside some set V; (there are less than gen” such edges.) 


All edges of G that lie in irregular pairs [there are at most tek? ¢? < eqn? 
such edges.] 


All edges of G that lie in regular pairs (V;, V;), where the density d(V;, V;) is 
smaller than 4¢ [there are less than (5) 4ec? < ten? such edges.] 


Since G’ is obtained from G by deleting less than en? edges, it contains a triangle, 
as G is e-far from being triangle-free. By the definition of G’, the vertices of this 
triangle must le in three distinct sets V;, any two of which form a regular pair of 
density at least 4 5€. Without loss of generality oa that these sets are Vi, Vo, Ms. 
Call a vertex v, *e V, typical if it has at least 4 qec neighbors i 7 Vo ane at least rae 
neighbors in V3. We claim that all vertices of V; but at most 2- tec < de are typical. 
Indeed, if X is the set of all vertices of V; that have less than FEC neighbors in V2 then 
its cardinality must be smaller than FEC, since otherwise the pair X, and X2 = Vo, 
together with the fact that d(V,, V2) > €, would violate the (¢/4)-regularity of this 
pair. Similarly, there are less than ec vertices of V; that have less than 4ec neighbors 
in V3, proving the claim. 

Fix a typical vertex v,; € Vj, and let No, Ns denote the sets of all its neighbors in 
V2 and V3, respectively. Thus |N2|,|N3| > tec, and hence, by the (¢/4)-regularity 
ot the pair (Vj, V3) and the fact that its density is at least 4 5€, there are at least 

4¢|Nal|Na| 2 (e/4)c? edges between N2 And N3. We conclude that v lies in at 
least (c/4)%c? triangles. As there are at least $c typical vertices in V;, and since 


> (1 - ©)" 8/1? >— 


the desired result follows. |_| 


Corollary 17.4.2 The property of being triangle-free is testable with one-sided error. 


Proof. For < > 0, let 6 = d(€) be as in Lemma 17.4.1. Given a graph G = (V, F) on 
n vertices, consider the following randomized algorithm for testing if G is triangle- 
free. Let s be a confidence parameter. Pick randomly and independently s/6 triples 
of vertices of the graph, and check if at least one of them forms a triangle. If so, 
then report that the graph is not triangle-free, otherwise, report that the graph is 
triangle-free. Clearly, if G is triangle-free, the algorithm will decide so. If it is e-far 
from being triangle-free, then by Lemma 17.4.1, the probability that the algorithm 
will err and report that G is triangle-free does not exceed (1 — 65)8/° < e~®°. This 
completes the proof. | 
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17.5 CHARACTERIZING THE TESTABLE GRAPH PROPERTIES 


In this section we describe several recent results on graph property testing. The 
proofs of these results apply a strong variant of the Regularity Lemma, proved in 
Alon, Fischer, Krivelevich and Szegedy (2000). The detailed proofs are somewhat 
technical and will not be given here. 

A graph property is monotone if it closed under removing vertices and edges. Thus 
being k-colorable or triangle-free is a monotone property. A property is hereditary if 
it is closed under removal of vertices (and not necessarily under removal of edges). 
Clearly every monotone graph property is also hereditary, but there are also many 
well-studied hereditary properties that are not monotone. Examples are being a per- 
fect graph, a chordal graph, an interval graph and many more. The results discussed 
in the previous subsections deal with two special cases of hereditary properties that 
are also monotone, namely, being triangle-free and being k-colorable. Handling 
hereditary non-monotone graph properties, such as being perfect or not containing 
an induced cycle of length 4, is more involved than handling monotone properties. 

For a (possibly infinite) family of graphs F, a graph G is said to be induced F -free 
if it contains no F € F as an induced subgraph. The following lemma is not difficult. 


Lemma 17.5.1 Let F be a (possibly infinite) family of graphs, and suppose there 
are functions f¢(e) and 6x(€) such that the following holds for every € > 0: Every 
graph G on n vertices that is e-far from being induced ¥-free contains at least 
6x-(e)nf induced copies of a graph F € F of size f < fr(e). Then, being induced 
F -free is testable with one-sided error. 


The following general result is proved in Alon and Shapira (2005). A subsequent 
different, elegant, but non effective proof can be found in Lovasz and Szegedy (to 


appear). 


Theorem 17.5.2 [Alon and Shapira (2005)] For any (possibly infinite) family of 
graphs F there are functions fr(e) and d¢(e) satisfying the conditions of Lemma 
17.5.1. 


It is easy to see that one can define for any hereditary property P, a (possibly 
infinite) family of graphs Fp such that satisfying P is equivalent to being induced 
Fp-free. Indeed, we simply put a graph F’ in Fp if and only if F does not satisfy 
P. It thus follows that Theorem 17.5.2, combined with Lemma 17.5.1, implies the 
following. 


Theorem 17.5.3 Every hereditary graph property is testable with one-sided error. 
An easy consequence of Theorem 17.5.2 is the following. 
Corollary 17.5.4 For every hereditary graph property P, there is a function Wp(e) 


with the following property: If G is e-far from satisfying P, then G contains an 
induced subgraph of size at most Wp(e) that does not satisfy P. 
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Using Theorem 17.5.3 one can obtain a characterization of the “natural” graph 
properties, which are testable with one-sided error. 

A tester (one-sided or two-sided) is said to be oblivious if it works as follows: 
Given ¢ the tester computes an integer Q = Q(e) and asks an oracle for a subgraph 
induced by a set of vertices S of size Q, where the oracle chooses S randomly and 
uniformly from the vertices of the input graph. If Q is larger than the size of the 
input graph then the oracle returns the entire graph. The tester then accepts or rejects 
according to the graph induced by 9. 

In some sense, oblivious testers capture the essence of property testing as essen- 
tially all the testers that have been analyzed in the literature are in fact oblivious, or 
could easily be turned into oblivious testers. Clearly some properties cannot have 
oblivious testers; however, these properties are not natural. An example is the prop- 
erty of not containing an induced cycle of length 4 if the number of vertices is even, 
and not containing an induced cycle of length 5 if the number of vertices is odd. 

Using Theorem 17.5.3 it can be shown that if one considers only oblivious testers, 
then it is possible to precisely characterize the graph properties, which are testable 
with one-sided error. To state this characterization we need the following definition. 

A graph property P is called semi-hereditary if there exists a hereditary graph 
property 7 such that the following holds. 


1. Any graph satisfying P also satisfies H. 


2. For any € > O there is an M(e) such that any graph of size at least M(e) that 
is €-far from satisfying P does not satisfy H. 


Clearly any hereditary graph property P is also semi-hereditary because we can 
take H in the above definition to be P itself. In simple words, a semi-hereditary 
P is obtained by taking a hereditary graph property 7 and removing from it a 
(possibly infinite, carefully chosen) set of graphs. This means that the first item in 
the definition above is satisfied. The only restriction, which is needed to get item 2 
in the definition, is that will be such that for any « > 0 there will be only finitely 
many graphs that are e-far from satisfying it, and yet satisfy 1. We are now ready to 
state the characterization. 


Theorem 17.5.5 A graph property P has an oblivious one-sided tester if and only if 
P is semi-hereditary. 


The proof can be found in Alon and Shapira (2005). The Regularity Lemma and 
its strong variant mentioned in the beginning of this subsection play a crucial role in 
this proof. This is not a coincidence. In Alon, Fischer, Newman and Shapira (2006) it 
is shown that the property defined by having any given Szemerédi-partition is testable 
with a constant number of queries. This leads to a combinatorial characterization 
of the graph properties that are testable with a constant number of queries. This 
characterization (roughly) says that a graph property P can be tested by a two-sided 
error tester with a constant number of queries if and only if testing P can be reduced 
to testing the property of satisfying one of finitely many Szemerédi-partitions. See 
Alon et al. (2006) for the precise formulation and detailed proof. 
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17.6 EXERCISES 


1. Show that for every fixed « > 0 and 0 < p < 1 there is an mp = mo(e, p) so 


that for every n > 2m > mo, the probability that two fixed disjoint sets A and 
B, each of size m, of the random graph G(n, p) do not form an e-regular pair 
is smaller than ¢. 


. (Removal Lemma.) Show that for any fixed graph H on h vertices and for any 


€ > O there is ad = d(e, H) > 0 so that if one has to delete at least en? edges 
from an n-vertex graph G to destroy all copies of H, then G contains at least 
én” copies of H. 


. (*) Using Lemma 17.4.1 prove that for any « > 0 there is an no so that if 


n > no then every subset A C {1,2,...,n} of size |A] > en contains a 
three-term arithmetic progression. 


. Combine Turan’s Theorem with the Regularity Lemma to prove the following 


result, due to Erdés, Simonovits and Stone: For every fixed graph H of 
chromatic number r > 1 and every € > 0, there is an n9 = no(H, €) so that if 
nm. > ng then any simple graph with n vertices and at least 


er oG) 


edges contains a copy of H. 


. A graph is chordal if any cycle of length at least 4 in it has a chord. Apply 


Corollary 17.5.4 to show that for every « > 0 there isa k = k(e) so that every 
graph on n vertices in which every cycle of length at least 4 and at most & has 
a chord can be transformed into a chordal graph by adding and/or deleting at 
most en” edges. 


. (*) A construction of Behrend (1946) gives a subset X of {1,2,...,m} of 


size |X| > m/e°v'°S™ with no three-term arithmetic progression. Show how 
to construct from such an X a graph on n vertices, which is ¢-far from being 
triangle-free, and yet contains only <°!°8(/#)n3 triangles. 


. Prove that the property of being triangle-free is not testable with a one-sided 


error tester whose query complexity is polynomial in 1/e. 


. A graph G is H-free if it contains no copy of H. Prove that for every bipartite 


graph H with h vertices, there is ac = c(h) > 0 so that any graph G on n 
vertices that is e-far from being H-free contains at least e°n” copies of H. 


THE PROBABILISTIC LENS: 


Turan Numbers and 
Dependent Random Choice 


For a graph H and an integer n, the Turan number ex(n, H) is the maximum possible 
number of edges in a simple graph on n vertices that contains no copy of H. The 
asymptotic behavior of these numbers for graphs of chromatic number at least 3 is 
well known; see, for example, Exercise 4 in Chapter 17. For bipartite graphs H, 
however, the situation is considerably more complicated, and there are relatively 
few nontrivial bipartite graphs H for which the order of magnitude of ex(n, H) is 
known. Here we prove that for every fixed bipartite graph H in which the degrees 
of all vertices in one color class are at most r, there is a constant c = c(H) so that 
ex(n, H) < cn?-1/", This is tight for all values of r, as it is known that for every 
rand t > (r — 1)!, there is a simple graph with n vertices and at least c,.,.n?~1/" 
edges, containing no copy of the complete bipartite graph K,.,. 

The basic tool in the proof is a simple and yet surprisingly powerful method, 
whose probabilistic proof may be called “dependent random choice,” as it involves a 
random selection of a set of vertices, where the choices are dependent in a way that 
increases the probability that r-tuples of selected vertices will have many common 
neighbors. An early variant of this lemma was first proved in Kostochka and Rédl 
(2004) and Gowers (1998). The proof given here is from Alon, Krivelevich and 
Sudakov (2003). 


Lemma I Let a,6,n,7r be positive integers. Let G = (V, F) be a graph on |V| =n 
vertices with average degree d = 2|E|/n. If 


LUBY 0 
n r nr 
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then G contains a subset Ao of at least a vertices so that every r vertices of Ag have 
at least b common neighbors. 


Proof. Let 7 be a (multi)-set of r random vertices of G, chosen uniformly with 
repetitions. Set 
A={vEV:TCN(v)}, 


where N(v) denotes the set of all neighbors of v. Denote by X the cardinality of A. 
By linearity of expectation, 


ex} = (RON) 2S Dvr 


vEV 
Ess Lvev INI)" _ 1 (2IEI\" _ 
n” n nr} n nr-i? 


where the inequality follows from the convexity of f(x) = 2”. 

Let Y denote the random variable counting the number of r-tuples in A with fewer 
than 6 common neighbors. For a given r-tuple R C V, the probability that R will be 
a subset of A is precisely (|N*(R)|/n)", where N*(R) denotes the set of all common 

neighbors of the vertices in R. As there are at most (”) subsets R of cardinality 
|R| = r for which |N*(R)| < 6 — 1, it follows that 


eons(’) (42) 


Applying linearity of expectation once again, we conclude by (1) that 


cB are ale = 2 (") (24) sa-1. 


Hence there exists a choice for T so that for the corresponding set A we get X —Y > a. 
Pick such a set, and omit a point from every r-tuple in it with fewer than b common 
neighbors. This gives a set Ao of at least a vertices so that every r vertices in it have 
at least b common neighbors. a 


IV 


Theorem 2 Let H be a bipartite graph with maximum degree r on one side. Then 
there exists a constant c = c(H) > 0 such that 


ex(n, H) < en?" , 


Proof. Let A and B be the vertex classes of H, and suppose |A| = a, |A|+|B| =), 
where the degree of every vertex b € B in H does not exceed r. Let G = (V, E) be 
a graph on |V| = n vertices with average degree d = 2\E|/n > cn'~1/", where c 
satisfies c” > (b—1)"/r!+a-— 1. It is easy to check that (1) holds. To complete 
the proof, it suffices to show that G must contain a copy of H. By Lemma | there is 
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a subset Ag C V of cardinality |Ag| = a so that every r-subset of Ag has at least b 
common neighbors in G. It is now an easy matter toembed H in G. To do so, start by 
embedding the vertices of A by an arbitrary injective function from A to Ag. Proceed 
by embedding the vertices of B one by one in an arbitrary order, making sure that 
in each step the image of the new embedded vertex is connected to the images of its 
neighbors in H and is different from the images of all previously embedded vertices. 
Since every set of (at most) r vertices of Ap has at least b common neighbors in G, 
this process can be performed until the images of all b vertices of H are found. This 
completes the proof. i) 
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Appendix A 
Bounding of Large 
Deviations 


A.1| CHERNOFF BOUNDS 


We give here some basic bounds on large deviations that are useful when employing 
the probabilistic method. Our treatment is self-contained. Most of the results may be 
found in, or immediately derived from, the seminal paper of Chernoff (1952). While 
we are guided by asymptotic considerations the inequalities are proved for all values 
of the parameters in the specified region. The first result, while specialized, contains 
basic ideas found throughout the appendix. 


Theorem A.1.1 Let X;,1 <i <n, be mutually independent random variables with 
1 
Pr [X; = +1] = Pr[X; = -1] = ; 
and set, following the usual convention, 


Sp =X tet Xp. 


Leta > 0. Then , 
Prise Sale 87 
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We require Markov’s Inequality, which states: Suppose that Y is an arbitrary 
nonnegative random variable, a > 0. Then 


Pr[Y > aE [Y]] <s. 


Proof. Fix n, a and let, for the moment, \ > 0 be arbitrary. For 1 <i <n, 


e + ee 


E fens = 5 


= cosh(A). 


We require the inequality cosh(A) < e”/2 valid forall \ > 0, the special case a = 0 
of Lemma A.1.5 below. (The inequality may be shown more easily by comparing 
the Taylor series of the two functions termwise.) 


n 


eon — Ile 


i=1 
Since the X; are mutually independent so are the e?*:; expectations multiply and 


n 
E [erS"] - I] E [er**] = cosh"(A) < ern? 


i=) 
We note that S,, > aif and only if eS» > e*? and apply Markov’s Inequality so that 
Pr [Sn > al =Ppr [e*?n > eX?) <E [er] fer < ed’ n/2-dra 
We set A = a/n to optimize the inequality, Pr[S,, > a] < e~'/2" as claimed. Ml 


By symmetry we immediately have the following. 


Corollary A.1.2 Under the assumptions of Theorem A.1.1, 
Pr {|S,,| >a] < Qe-0/2n 


The proof of Theorem A.1.1 illustrates the basic idea of the Chernoff bounds. We 
wish to bound Pr [X > a] for some random variable X. For any positive \ we bound 


Prix SaloPrie* Se") < ble ler, (A.1) 


The core idea of the Chernoff bounds is to select that A that minimizes E [er* | et: 
The art to the Chernoff bounds is to select a A that is reasonably close to optimal and 
easy to work with, yielding upper bounds on Pr |X > a] that are, one hopes, good 
enough for our purposes. Bounds on Pr [X < a] are similar. For any positive \ we 
bound 

Pr{X <a] =Pr[e** > e 4) Sh le** |e. 


Chernoff bound arguments tend to be cleaner when E [X] = 0. A simple translation, 
replacing X by X — ys where « = E[X], is often quite helpful. 
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It is instructive to examine the case when N is the standard normal distribution 
A ie ee 2 
and a is positive. In this instance E le” | =e /? and so 


Pr[N > a] =Pr[e*% > e**| < E[e’*] ee" = eX" [2a , 
Elementary calculus leads to the optimal choice A = a so that 
Pr[N >a] < gre le 
This compares well, as a — oo, with the actual asymptotics 


ee ee ee 
Pr{N > al Tog i: e dt rae : 
Results with N being normal with mean pu and variance o? are similarly good. This 
explains, to some extent, the efficacy of the Chernoff bounds. When a random 
variable X is “roughly” normal the Chernoff bounds on Pr [|X > a] should be quite 
close to the actual values for a large. In practice, however, precise calculations of 
E [e**] can be difficult or impossible to achieve and there can be considerable art in 


finding approximations for E [e**] that will allow for good bounds on Pr [X > a]. 


Many of our remaining results will dea! with distributions X of the following 
prescribed type. 


Assumptions A.1.3 
© P1,---,Dn € [0, I], 


e X),...,Xxn are mutually independent with 
Pr [Xi =] — pil =PDi and Pr [X; = —p;| = 1 — Pi 


e p=(pit---+pn)/nand X =X, +--+ Xp. 


Remark. Clearly E[X] = E[X;] = 0. When all p; = 1/2, X has distribution 
S,/2. When all p; = p, X has distribution B(n, p) — np, where B(n, p) is the usual 
binomial distribution. 
Theorem A.1.4 Under Assumptions A.1.3 and with a > 0, 

Pr [X >a] < ete ln 
Lemma A.1.5 For all reals a, 3 with |a| < 1, 


cosh(Z) + asinh(f) < e%/2+28 | 


Proof. This is immediate if a = +1 or |G] > 100. If the lemma were false the 
function f(a, 3) = cosh(@) + asinh() — e® /2+28 would assume a positive global 
maximum in the interior of the rectangle R = {(a, 3) : |a| < 1,|G| < 100}. 
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Setting partial derivatives equal to zero we find 


sinh() + a cosh(;3) (a+ B)e® /2+e8 | 
sinh(@) —d Be® /2+08 F 
and thus tanh(@) = £, which implies G = 0. But f(a,0) = 0 for all a, a 


contradiction. | 


Lemma A.1.6 For all 0 € {0,1} and all d, 020-9 +. (1 — B)e? < e/8. 
Proof. Setting @ = $(1 + a) and = 2(, this lemma reduces to Lemma A.1.5. @ 


Proof [Theorem A.1.4]. Let, for the moment, A > 0 be arbitrary. 
E je = pjer-Pa) a (i — pije~P < e'/8 


by Lemma A.1.6. Then 


Applying Markov’s Inequality, 
Pr[X >a) = Pr le" Se) <Ble**| je < eran. 
We set \ = 4a/n to optimize the inequality: Pr (X > a] < e~22°/", as claimed. 
Again by symmetry we immediately have the following. 
Corollary A.1.7 Under Assumptions A.1.3 and with a > 0, 
Pr [|X| > a] < 2e72/ 
Under Assumptions A.1.3 with X arbitrary, 


Ble} = Tele] =T] (met + 01 pe) 
t=1 


7=1 
n 


= e "TT (pie + (1—pi)) . 


i=l 
With \ fixed, the function 


f(x) = In(we* + 1 — 2) = In(Br +1) with B= e* -1 
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is concave and hence >", f(pi) < nf(p) (ensen’s Inequality). Exponentiating 
both sides, 


n 


[][ @e* + = pi) < (pe +(-p))", 


i=1 


so that we have the following. 
Lemma A.1.8 Under the Assumptions A.1.3, 
E [e**] < e7*?" (pe* + (1 —p))" 
Applying this lemma with inequality (A.1) yields the following. 
Theorem A.1.9 Under the Assumptions A.1.3 and with a > 0, 
Pr [X >a] < 7?” (pe + (1—p))"e"** 


forall > 0. 


Remark. For given p,n, a, an optimal assignment of A in Theorem A.1.9 is found 
by elementary calculus to be 


ela) a are) 


This value is oftentimes too cumbersome to be useful. We employ suboptimal . to 
achieve more convenient results. 


Setting A = In(1+a/pn) and using the fact that (1+ a/n)” < e%, Theorem A.1.9 
implies the following. 


Corollary A.1.10 Pr[X > a] < et?" n(+a/pn)—ain(1+a/pn) | 


To simplify further, apply the inequality In(1+u) > u—u?/2, valid for all u > 0, 
to Corollary A.1.10 with w = a/pn. This gives the following. 


Theorem A.1.11 Pr LX > a] < e~@/2pnta?/2(pn)? 
When all p; = p, X has variance np(1 — p). With p = o(1) and a = o(pn) this 
bound reflects the approximation of X by a normal distribution with variance ~ np. 


The bound of Theorem A.i.11 hits a minimum at a = 2pn/3. For a > 2pn/3 we 
have the simple bound 


Pr[X > a] < Pr[X > 2pn/3} < e72?"/2" 


This is improved by the following. 


312 BOUNDING OF LARGE DEVIATIONS 


Theorem A.1.12 For 3 > 1, 


Pr[X > (@—1)pn] < [e?-1 8-4)?" 
Proof. Direct “plug in” to Corollary A.1.10. a 


X + pn may be interpreted as the number of successes in n independent trials 
when the probability of success in the zth trial is ;. 


Theorem A.1.13_ Under Assumptions A.1.3 and with a > 0, 
Pr[X < -a] < ere. 


Note that one cannot simply employ “symmetry” as then the roles of p and 1 — p 
are interchanged. 


Proof. Let X > 0 be, for the moment, arbitrary. Then by the argument preceding 
Lemma A.1.8, ‘ 
Efe**] <6?" (pe>+(1—p)) . 


Thus 
Pr[X < —a] = Pr [e** > e**] < e*?" (pe + (1 —p))" aoe 


analogous to Theorem A.1.9. We employ the inequality 1+ u < e”, valid for all u, 
so that ~ 
pe-* + (1—p) =14+(e>—1)p< eh 9) 


and 
Pr[X < —a] < erpntnp(e*—1)—Aa = enple*=142)-da 


We employ the inequality 
ee ae ee 


valid for all \ > 0. (Note: The analogous inequality e* < 1+ + 7/2 is not valid 
for \ > 0 and so this method, when applied to Pr |X > aj, requires an “error” term 
as the one found in Theorem A.1.11.) Now 


Pr [X < -a] < oe 
Set A = a/np to optimize the inequality: Pr [|X < —a] < e~@'/2pn as claimed. Ml 
For clarity the following result is often useful. 


Corollary A.1.14 Let Y be the sum of mutually independent indicator random vari- 
ables, 1 = E[Y]. Foralle > 0, 


Pr [|Y — p| > ep] < 2e°%# 


where c, > 0 depends only on. 
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Proof. Apply Theorems A.1.12 and A.1.13 with Y = X + pn and 
ce = min{—In(ef(1 + e)7 +9), €?/2}. 
a 


The asymmetry between Pr [|X < a] and Pr[X > a] given by Theorems A.1.12 
and A.1.13 is real. The estimation of X by a normal distribution with zero mean and 
variance np is roughly valid for estimating Pr[X < a] for any a and for estimating 
Pr({X > a] while a = o(np). But when a and np are comparable or when a >> np 
the Poisson behavior “takes over” and Pr |X > a] cannot be accurately estimated by 
using the normal distribution. 


We conclude with several large deviation results involving distributions other than 
sums of indicator random variables. 


Theorem A.1.15 Let P have Poisson distribution with mean ys. For € > 0 
Pr[P < p(1—€«)] one H/2 
bb 
Pr[P>p(lt+e)] < xe + et] 


lA 


Proof. For any s 
Pr[P = s| = lim Pr [B (n, =) =s| . 
na OO nN 
Apply Theorems A.1.12 and A.1.13. a 
Theorem A.1.16 Let X;, 1 < i <n, be mutually independent with all E|X;] = 0 
and all |X;| < 1. Set S = X, +---+ Xn. Then 


Pr[S >a] <e7?/2", 


Proof. Set, as in the proof of Theorem A.1.1, \ = a/n. Set 


Kg S eae 
~ er — 
h(x) = ——— + ———z. 


For x € [-1,1], e** < A(z). [y = h(x) is the chord through the points x = +1 of 
the convex curve y = e**.] Thus 


E [e***] < E[h(X;)] = R(E[X;]) = h(0) = cosh X. 


The remainder of the proof follows as in Theorem A.1.1. a 
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Theorem A.1.17 Suppose E[X] = 0 and no two values of X are ever more than 
one apart. Then for all X > 0, 


E [e**] < er /8 | 


Proof. Fix b € [—3, 5] with X € [3(—1 + 6),3(41+))]. Let y = h(z) be the 
straight line intersecting the curve y = e*” at the points $(+1+ b). As e*” isa 
convex function, e*” < h(x) for all x € (4(—1+ 6), (+1 + })]. Thus 


E [e**] < E[h(X)| = h(E[X]) = h(0). 


We calculate h(0) = e°/?[cosh(\/2) — bsinh(A/2)], which is at most e?”/8 by 
Lemma A.1.5. a 


Theorem A.1.18 Let X;, 1 <i <n, be independent random variables with each 
E[X;] = 0 and no two values of any X; ever more than one apart. (We allow, 
however, values of different X;,, Xj to be further apart.) Set S = X, +---+ Xn. 
Then 

Pr[S >a] < ene", 


Proof. E [e*S] = []”., E [e**'] < e”/8 by Theorem A.1.17. Then for > 0, 
AS x Aa nd? 
Pr[S > a] = Pr [e >e | <exp “a7 7 Aa 
and we set A = 4a/n. | 


We have been roughly guided by the notion that if X has mean zero and variance 
o? then Pr[X > ao] should go like e~*/2. There are times when this idea is 
very wrong. Consider Assumptions A.1.3 with all p; = 1/n so that X = P, — 1, 
where P,, has the binomial distribution B(n,1/n), which is asymptotically P, the 
Poisson distribution with mean one. Then E [|X] = 0 and Var [X] ~ 1. For a fixed 
Pr[X =a] > 1/e(a + 1)!, which is far bigger than e~* /2. With this cautionary 
preamble, we give a general situation for which the notion is asymptotically correct 
when a is not too large. 


Theorem A.1.19 For every C > Oandeé > 0 there exists 6 > 0 so that the following 
holds: Let {X;}%_,, n arbitrary, be independent random variables with E[X;| = 0, 
|X;| < C and Var [Xi] = 07. Set X = SY, Xj and o? = YP, 0? so that 
Var [X] = 07. Then for 0 <a < do, 


Pr |X > ao] < eae is). 
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Proof. We set \ = a/o so that 0 < \ < 6. Then 


As |X| < C*-*X? we bound 
B[x#] <B [XH] < OME [x2] = che? 
For k > 3 we bound 2/k! < 1/(k — 2)! so that 
Ble] <14 Xo? b+? Care| ai Fete. 
3 


We choose 6 to satisfy eF c1t+e. ASA <OS, 
2 a 
Ble) <4 4 sal +e) < exp [ota + | 


This inequality has held for all X; so 


E [e**] = ie [e***] < exp [For = 3] 
i=1 


and 
Pr{[X > ao] <E [e**] en h8 cc poe (1-2) | 


A.2)> LOWER BOUNDS 


The Chernoff bounds of the previous section give upper bounds for Pr [X > a] by 
examining one value (albeit, the right one!) of the Laplace transform E [e**]. Here 
we use three values of the Laplace transform to give lower bounds for Pr |X > al. 
We shall set 


fO) Se Ble*| 5 
ga(A) = f(dje*. 


With this notation Pr [X > a] < ga(A) and the Chernoff bound is achieved by taking 
that A minimizing g,(A). For any positive u and «, 


X>atu => AX < (A+e)X —ea-eu, 
X<a-u 4 AX < (A-—e)X +ea-—eu, 
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so that 


lA 


E [er* y(X >at u)| fAteje e ™, 
E[e’*x(X <a—u)] < f(\-ejee™. 


Subtracting these from E [e**] yields 
E [e** x(|X —a| < u)] > f(A) -—e [f(A + ele © + f(A — ee}. 
When |X — al < u, e** < e*“e** so 
Pr [|X —a| <u] > ee E [e** x (|X — aj <u)] . 
But Pr [X > a— ul] > Pr[|X — a) < ul, giving our general result as follows. 
Theorem A.2.1 For any a, u, A, € with u, A, €, \ — € all positive, 
Pr [X >a—u] >e7™ [ga(A) — e[ga(A +e) + ga(A—e)]] - 


We note that this bound has used only three values of the Laplace transform: 


fA), fA =), J Ae): 


It is instructive to examine the case when NN is the standard normal distribution. 
We assume a is positive and are interested in the asymptotics as a > +00. We set 
2 
d = aso that ga(A) = e~* /*. Now 


ga(A + €) = eirte)?/2—a(Ate) = ga(A)e® /? ; 


The cancellation of the linear (in ¢) terms was not serendipity, but rather reflected the 
critical choice of \ to minimize In(g,(A)). Now 


Pr[N >a—u] > ga(aje"™ [2 = renege] 
Suppose we take ¢ = u = 2. This gives 
Pr[N > @—2] > e-*/%e-74 [1 — 2e-?] . 


Rescaling: Pr{N > a] = Q(e~?’/2e~4*). In contrast we have the upper bound 
Pr[N >a] <e7?’/?, 

In many applications one does not have the precise values of the Laplace transform 
f(A). Suppose, however, that we have reasonably good estimates in both directions 
on f(A). Then Theorem A.2.1 will give a lower bound for Pr [X > a —~ ul] by using 
a lower bound for gq(A) and upper bounds for ga(A + €). Our goal will be less 
ambitious than the estimate achieved for the standard normal N.. We shall be content 
to find the asymptotics of the logarithm of Pr [X > a]. In the next result, the X,, 
may be imagined to be near the normal distribution. The interval for could easily 
be replaced by [(1 — y)an, (1 + y)an] for any fixed positive 7. 
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Theorem A.2.2 Let X,, be a sequence of random variables and ay, a sequence of 
positive reals with limp +o Gn = 00. Assume 


E [e**"] = e2> (1+0(1)) 


uniformly for On <A< 3 an. Then 


ie 
In Pr [X, > Gp] ~ 5; ; 


Remark. For X, = S,n71/?, B[e**=] = cosh"(\n71/?). When u — 0, 


Incosh(u) ~ Sur, The conditions of Theorem A.2.2 therefore hold when a, = 


o(/n) and a, — +00. That is, In Pr[S, > b,] ~ —b2/2n when /n < bn <n. 
Proof. The upper bound is the Chernoff bound with A = ap. 


Pr [Xn > an] < E [e*%*] ee = eB an 4000) | 


For the lower bound we first let 6 € (0, 0.01) be fixed. We set \ = a = a,,(1+ 6), 
u = a,6, € = 6/10. Applying Theorem A.2.1 


Prix Sal oe Bw 


with 
B = ga(a) —e *"[ga(a t+ €) + gala —e)]. 


But In[g.(a)| ~ — 5a" and, analogous to our result for the standard normal, 


a? oe 6 a? §2 
ae (12 a . (1%) oF (ae) 


As cu = a767/10(1 + 6) we have e-*“g,(a+e€) < ga(a). Now B is dominated by 
its initial term and 


Pr [X > an] > e~*ga(a)(1 — o(1)). 


Taking logarithms: 
2 
In[Pr [X > an]] > —a25(1 + 5) - (1 + 6)?(1 + o(1)) — o(1). 


As this holds for any fixed 6 € (0,0.01) 


In[Pr [X > a,]] > ao + o(1)). 
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We have seen that Pr[S,, > b,]| can be well approximated by Pr[/nN > dy] 
as long as /n < bp <n. For b, = O(n) this approximation by the normal 
distribution is no longer valid. Still, we shall see that the Chernoff bounds continue 
to give the right asymptotic value for In Pr[S,, > b,]. We place this in a somewhat 
wider context. Ellis (1984) has given far more general results. 


Theorem A.2.3 Let Z,, be a sequence of random variables. Let a be a fixed positive 
real. Set 


F(A) = Jim, S In E [e?4] . 
Suppose that there exists X > 0 and an open interval I containing such that 
1. F(s) exists and has a first and second derivative for all s € I. 
2. PAY =a. 
3. F’ is a strictly increasing function in I. 
4. There isa K so that |F”(s)| < K forall s € I. 
Then i 
lim = In Pr[Z,, > an] = F(A) -— ad. 


n-CO 


Remark. Let X be a random variable whose Laplace transform is well defined. Let 
Zn denote the sum of n independent copies of X. Then F(A) = InE {e**]. In 
particular, suppose Pr [X = 1] = Pr [X = —1] = } so that Z,, = S,. Then F(A) = 
Incosh(A). For any a € (0, 1) there is a positive \ for which a = F’(\) = tanh(). 
The conditions of Theorem A.2.3 hold and give the asymptotics of In(S, > an). 


Proof. The upper bound is the Chernoff bound as 


Pr[Z, > an] <E [er] en Oar = eM(FO)—aA+0(1) | 


For the lower bound we will apply Theorem A.2.1. First note that since F’ is 
continuous and monotone over J it has a continuous inverse H defined over some 
interval J containing a. Note H(a) = X. Let u be positive and sufficiently small so 
that H(a+u)+u/k EI. As 


4 u 
lim H(a+u) + 7 = H(a)=4, 


all sufficiently small 1 satisfy this criterion. 
Set a* = a+ uand \* = H(a*) so that F’(A*) = a*. We define 


Gale) = Ble |e-** 
Theorem A.2.1 (noting that an = a*n — un) states 


Pr[Z,, San) >e°**” [9n(A*) — eS" gn (A* +) + gn(A* — )]] . 


LOWER BOUNDS 319 


We select ¢ = u/K. Our selection of u assures us that A* + € belong to J. We have 
. 1 e~ Fung, (A* 5 é) _ * * * 
ig, tn (SE) = ~—eut F(M* +e) — F(A*) —ea*. 


We have selected A* so that F’(A*) = a*. Since |F’”’(s)| < K in the interval J 
Taylor series bounds 


|F(A* +e) — F(A*) — ea*| < e?. 


ml 


Our choice of €¢ (chosen to minimize the quadratic though any sufficiently small ¢ 
would do) gives that 


U 
2K - 
Thus e~©"g,,(A* + €)/gn(A*) drops exponentially quickly. We only use that for n 


sufficiently large the ratio is less than 0.25. The same argument shows that for n 
sufficiently large e~ ©" gn (A* — €)/gn(A*) < 0.25. For such n we then have 


—eu+ F(d* +6) — F(A*) —e€a* < 


1 ~ 
Pr[Z, > an] > ae OG). 


This lower bound is exp[n(F'(A*) — A*a* + o(1))]. Now consider F'(A*) — A*a* 
as a function of u. As u > 0, A* = H(a+u) > H(a) = X. As F is continuous 
F(\*) > F(A). Clearly a* = a+ u — a and therefore A*a* — da. Thus 


F(A*) ~ A*a* — F(A) — ra 


re) 
Pr[Z, > an| > erl(F(A)—Aato(1)) | 


Remark. Let Z,, be a sequence of random variables with mean and variance 1, and 
o2, respectively. The analysis of Pr[Z, > Un + AnOn] frequently (9, being the 
premier example) splits into three parts: 


1. Small Deviations. 4, — 2, a positive constant. One hopes to prove asymptotic 
normality so that Pr{Z,, > n+ Anon] — Pr[N >A}. There is a huge 
literature on asymptotic normality but, for the most part, asymptotic normality 
is not covered in this work. 


2. Large Deviations. \», — +0o and A, = 0(0;,). One hopes to show that Z,, is 
approximately normal in the sense that In Pr[Z, > fin + AnOn] ~ —A2/2. 


3. Very Large Deviations. ,, > +00 and A,, = 2(o,,). Here the approximation 
of Z,, by the normal distribution generally fails but one hopes that the asymp- 
totics of In Pr {Zp > fn + Anon] may still be found by the methods we have 
given. 
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A.3 


BOUNDING OF LARGE DEVIATIONS 


EXERCISES 


. The Hajés number of a graph G is the maximum number k such that there 


are k vertices in G with a path between each pair so that all the i) paths are 
internally pairwise vertex disjoint (and no vertex is an internal vertex of a path 
and an endpoint of another). Is there a graph whose chromatic number exceeds 
twice its Hajés number ? 


. For two subsets A and B of the set Z,, of integers modulo m and for g € Zm, 


denote 
s(A, B,g) = |{(a,b):a€ Ajbe Bra+b=g}f. 


For a partition of Z,,, into two disjoint sets Z,, = AU B, AN B = 0 denote 


c(A, B) = max |s(A, A, x) + s(B, B, x) — 28(A, B,2)|. 


Prove that for every odd m there is a partition of Z,, into two disjoint sets A 
and B such that c(A, B) = O(.\/mlogm). 


. Fora € (0, 1) apply Theorem A.2.3 to find lim,,(1/n) In Pr [S;, > an] explic- 


itly. Express Pr [S,, > an] combinatorially as 2~” times the sum of binomial 
coefficients. Use Stirling’s formula to asymptotically evaluate this sum and 
show that you get the same result for lim, (1/n) In Pr[S, > an]. 


. More generally, for p € (0,1) fixed, apply Theorem A.2.3 to find the asymp- 


totics of In Pr[B(n, p) > an] for p < a < 1 and of InPr[B(n,p) < an] for 
0 < a < _p. Show that an application of Stirling’s formula gives the same 
answer. 


. Let {X;}%, be independent random variables, each chosen uniformly from 


{+1,+2,-3}. Set Y¥, = 0j., X;. Let f(n) be the minimal value so that 
Pr [Y, > f(n)] < 1/n. Find the asymptotics of f(n). Redo with 1/n replaced 
by n °°, (Note that it doesn’t change the answer much!) 


THE PROBABILISTIC LENS: 


Triangle-Free Graphs Have 
Large Independence 
Numbers 


Let a(G) denote the independence number of a graph G. It is easy and well known 
that for every graph G on n vertices with maximum degree d, a(G) > n/(d + 1). 
Ajtai, Komldés and Szemerédi (1980) showed that in case G is triangle-free, this can 
be improved by a logarithmic factor and in fact a(G) > (cnlogd)/d, where c is 
an absolute positive constant. Shearer (1983) simplified the proof and improved the 
constant factor to c = 1+ o(1). Here is a very short proof, without any attempt 
to optimize c, which is based on a different technique of Shearer (1995) and its 
modification in Alon (1996). 


Proposition 1 Let G = (V, E) be a triangle-free graph on n vertices with maximum 
degree at most d > 1. Then a(G) > (nlog d)/8d, where the logarithm here and in 
what follows is in base 2. 


Proof. If, say, d < 16 the result follows from the trivial bound a(G) > n/(d + 1) 
and hence we may and will assume that d > 16. Let W be a random independent set 
of vertices in G, chosen uniformly among all independent sets in G. For each vertex 
v € V define a random variable X, = di{v} NW| + |N(v) NM WI, where N(v) 
denotes the set of all neighbors of v. We claim that the expectation of X, satisfies 
E[X,] > 4 logd. 

To prove this claim, let H denote the induced subgraph of G on V — (N(v) U{v}), 
fix an independent set S in H and let X denote the set of all non neighbors of S in 
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the set N(v), |X| = x. It suffices to show that the conditional expectation 


log d 

B[X, | WNV(H) = $] >" (1) 
for each possible S. Conditioning on the intersection W ™V(H) = S there are 
precisely 2 + 1 possibilities for W: one in which W = S U {v} and 2° in which 
uv & W and W is the union of S with a subset of X. It follows that the conditional 
expectation considered in (1) is precisely 


d 4 wa” 
QWz+yo QWw4+4- 


To check that the last quantity is at least i log d observe that the assumption that 
this is false implies that « > 1 and 2*(logd — 2x) > 4d — logd, showing that 
log d > 2x > 2 and hence 4d —log d < Vd(log d —- 2), which is false for all d > 16. 
Therefore 


E[X, |WnV(H) =S] > ABE, 


establishing the claim. 

By linearity of expectation we conclude that the expected value of the sum 
Yvev Xv is at least }nlogd. On the other hand, this sum is clearly at most 
2d|W |, since each vertex u € W contributes d to the term X,, in this sum, and its 
degree in G, which is at most d, to the sum of all other terms X,. It follows that the 
expected size of W is at least (n log d)/8d, and hence there is an independent set of 
size at least this expectation, completing the proof. | 


The Ramsey number R(3, k) is the minimum number r such that any graph with 
at least r vertices contains either a triangle or an independent set of size k. The 
asymptotic behavior of this function has been studied for over fifty years. It turns 
out that R(3,k) = @(k?/log k). The lower bound is a recent result of Kim (1995), 
based on a delicate probabilistic construction together with some thirty pages of 
computation. There is no known explicit construction of such a graph, and the largest 
known explicit triangle-free graph with no independent set of size k, described in 
Alon (1994), has only @(k°/?) vertices. The tight upper bound for R(3, k), proved 
in Ajtai et al. (1980), is a very easy consequence of the above proposition. 


Theorem 2 [Ajtai et al. (1980)] There exists an absolute constant b such that 
R(3,k) < bk? / log k for every k > 1. 


Proof. Let G = (V, E) be a triangle-free graph on 8k” / log k vertices. If G has a 
vertex of degree at least & then its neighborhood contains an independent set of size 
k. Otherwise, by Proposition 1 above, G contains an independent set of size at least 
8k? logk _ 
logk 8k 


Therefore, in any case a(G) > k, completing the proof. i) 


Appendix B 
Paul Erdos 


Working with Paul Erdos was like taking a walk in the hills. Every time when I 
thought that we had achieved our goal and deserved a rest, Paul pointed to the 
top of another hill and off we would go. 


— Fan Chung 


B.1 PAPERS 


Paul Erdds was the most prolific mathematician of the twentieth century, with over 
1500 written papers and more than 490 collaborators. This highly subjective list gives 
only some of the papers that created and shaped the subject matter of this volume. MR 
and Zbl. refer to reviews in Math Reviews and Zentralblatt, respectively. Chapter 
and section reference are to pertinent areas of this volume. 


e A combinatorial problem in geometry, Compositio Math 2 (1935), 463-470 
(with George Szekeres); Zbl. 12, 270. 
Written when Erdos was still a teenager this gem contains a rediscovery of 
Ramsey’s Theorem and the Monotone Subsequence Theorem. Many authors 
have written that this paper played a key role in moving Erdos toward a more 
combinatorial view of mathematics. 
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Some remarks on the theory of graphs, Bull. Am. Math. Soc. 53 (1947), 
292-294, MR 8# 479d; Zbl 32, 192. 

The three-page paper that “started” the probabilistic method, giving an expo- 
nential lower bound on the Ramsey number R(k, &). Section 1.1. 


The Gaussian law of errors in the theory of additive number theoretic functions, 
Am. J. Math. 62 (1940), 738-742 (with Mark Kac); MR 2# 42c; Zbl. 24, 102. 
Showing that the number of prime factors of x chosen uniformly from 1 to n 
has an asymptotically normal distribution. A connection between probability 
and number theory that was extraordinary for its time. Section 4.2. 


Problems and results in additive number theory, Colloque sur la Théorie des 
Nombres, Bruxelles, 1955, 127-137, George Thone, Liége; Masson and Cie, 
Paris, 1956; MR 18# 18a; Zbl. 73, 31. 

Using random subsets to prove the existence of a set of integers such that every 
n is represented n = x + y at least once but at most clnn times. Resolving 
a problem Sidon posed to Erdés in the 1930s. This problem continued to 
fascinate Erdos: see, e.g., Erd6és and Tetali (1990). Section 8.6. 


On a combinatorial problem, Nordisk Mat. Tidsskr. 11 (1963), 220-223; MR 
28# 4068; Zbl. 122, 248. 

On a combinatorial problem II, Acta Math. Acad. Sci. Hung. 15 (1964), 
445-447; MR 29# 4700; Zbl. 201, 337. 

Property B. Probabilistic proofs that any m < 2”~/ n-sets can be two-colored 
with no set monochromatic yet there exist cn?2” n-sets that cannot be so 
colored. Section 1.3. 


On the evolution of random graphs, Magyar. Tud. Akad. Mat. Kutato Int. 
Kézl. 5 (1960), 17-61 (with Alfred Rényi); MR 23# A2338; Zbl. 103, 163. 
Rarely in mathematics can an entire subject be traced to one paper. For random 
graphs this is the paper. Chapter 10. 


Graph theory and probability, Can. J. Math. 11 (1959), 34-38; MR 21# 876; 
ZbI. 84, 396. 

Proving by probabilistic methods the existence of graphs with arbitrarily high 
girth and chromatic number. This paper convinced many of the power of the 
methodology, as the problem had received much attention but no construction 
had been found. The Probabilistic Lens: High Girth and High Chromatic 
Number, following Chapter 3. 


Graph theory and probability II, Can. J. Math. 13 (1961), 346-352; MR 22# 
10925; Zbl. 97, 391. 

Showing the existence of a triangle-free graph on n vertices with no inde- 
pendent set of size cn'/? Inn vertices, and hence that the Ramsey number 
R(3,k) = Q(k? In7?k). A technical tour de force that uses probabilistic 
methods in a very subtle way, particularly considering the early date of publi- 
cation. 


B.2 


CONJECTURES 325 


On circuits and subgraphs of chromatic graphs, Mathematika 9 (1962), 170- 
175; MR 25 # 3035; Zbl. 109, 165. 

Destroying the notion that chromatic number is necessarily a local property, 
Erdos proves the existence of a graph on n vertices that cannot be k-colored 
but for which every en vertices can be three-colored. The Probabilistic Lens: 
Local Coloring, following Chapter 8. 


On acombinatorial game, J. Combin. Theory, Ser. A 14 (1973), 298-301 (with 
John Selfridge); MR 48# 5655; Zbl. 293, 05004. 

Players alternate turns selecting vertices and the second player tries to stop 
the first from getting a winning set. The weight function method used was 
basically probabilistic and was an early use of derandomization. Section 16.1. 


CONJECTURES 


Conjectures were always an essential part of the mathematical life of Paul Erdés. 
Here are some of our favorites. 


Do sets of integers of positive density necessarily contain arithmetic progres- 
sions of arbitrary length? In finite form, is there for all & and all e« > 0, an 
no so that if n > no and S is a subset of the first n integers of size at least 
en then S necessarily contains an arithmetic progression of length k? This 
conjecture was first made by Paul Erdds and Paul Turan in the 1930s. It was 
solved (positively) by Szemerédi in the 1970s. Let F'(k, ¢) denote the minimal 
no that suffices above. The growth rate of F’ remains an intriguing question 
with very recent results due to Gowers. 


Call distinct S,T,U a A-system if SAT = SOU =TNOU. Let F(n) be the 
minimal m such that given any m n-sets some three form a A-system. Erdds 
and Rado showed that F'(n) exists and gave the upper bound F'(n) < 2”nl. 
Erd6s conjectured that F(n) < C” for some constant C. 


What are the asymptotics of the Ramsey function R(k, k)? In particular, what 
is the value c (if it exists) of lim, R(k, k)!/*? The classic 1947 paper of Erdés 
gives c > /2 and c < 4 follows from the proof of Ramsey’s Theorem but a 
half-century has seen no further improvements in c, though there have been 
some results on lower order terms. 


Write rg(n) for the number of solutions to the equation n = x + y with 
x,y € S. Does there exist a set S of positive integers such that rg(n) > 0 for 
all but finitely many n yet rg(n) is bounded by some constant K? The 1955 
paper of Erdds referenced above gives S with rg(n) = O(Inn). 


Let m(n), as defined in Section 1.3, denote the minimal size of a family of n- 
sets that cannot be two-colored without forming a monochromatic set. What are 
the asymptotics of m(n)? In 1963 and 1964 Erdés found the bounds 2(2”) < 
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m(n) = O(2"n?) and the lower bound of Radhakrishnan and Srinivasan, 
shown in Section 3.5, is now 2(2"(n/Inn)'/?). 


e Given 2”~? + 1 points in the plane, no three on a line, must some n of them 
form a convex set? This conjecture dates back to the 1935 paper of Erdos and 
Szekeres referenced above. 


Let m(n,k,1) denote the size of the largest family of k-element subsets of 
an m-set such that no /-set is contained in more than one of them. Simple 
counting gives m(n,k,l) < (7)/ We Erdés and Hanani conjectured in 1963 
that for fixed | < k this bound is asymptotically correct; that is, the ratio of 
m(n, k,l) to Cre) goes to one as n — co. Erdés had a remarkable ability 
to select problems that were very difficult but not impossible. This conjecture 
was settled affirmatively by Vojtech Rédl in 1985, as discussed in Section 4.7. 


The asymptotics of the difference (‘}) /(}) — m(n, k, 1) remains open. 


B.3 ONERDOS 


There have been numerous books and papers written about the life and mathematics 
of Paul Erdés. Three deserving particular mention are: 


e The Mathematics of Paul Erdos (Ron Graham and Jarik NeSetiil, eds.), Springer- 
Verlag, Berlin, 1996 (Vols. I and II). 


e Combinatorics, Paul Erdés Is Eighty (D. Miklés, V. T. Sés, T. Szényi, eds.), 
Bolyai Soc. Math. Studies, Vol. I (1990) and Vol. IT (1993). 


e Erdos on Graphs — His Legacy of Unsolved Problems, Fan Chung and Ron 
Graham, A.K. Peters, 1998. 


Of the many papers by mathematicians we note the following: 


e Léaszl6 Babai, In and out of Hungary: Paul Erdos, his friends, and times. In 
Combinatorics, Paul Erdos Is Eighty (listed above), Vol. II, 7-93. 


e Béla Bollobas, Paul Erdés — Life and work, in The Mathematics of Paul Erdos 
(listed above), Vol. II, 142. 


e A. Hajnal, Paul Erdés’ Set theory, in The Mathematics of Paul Erdos (listed 
above), Vol. II, 352~393. 


e Janos Pach, Two places at once: a remembrance of Paul Erdos, Math Intelli- 
gencer, Vol. 19 (1997), no. 2, 38-48. 


Two popular biographies of Erdds have appeared: 


e The Man Who Loved Only Numbers, Paul Hoffman, Hyperion, New York, 
1998. 
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e My Brain Is Open—The Mathematical Journies of Paul Erdos, Bruce Schechter, 
Simon & Schuster, New York, 1998. 


Finally, George Csicsery has made a documentary film, N Is a Number, A Portrait 
of Paul Erdos, available from the publishers A. K. Peters, which allows one to see 
and hear Erd6s in lecture and among friends, proving and conjecturing. 


B.4 UNCLE PAUL 


Paul Erdos died in September 1996 at the age of 83. His theorems and conjectures 
permeate this volume. This tribute,’ given by Joel Spencer at the National Meeting 
of the American Mathematical Society in January 1997, attempts to convey some of 
the special spirit that we and countless others took from this extraordinary man. 


Paul Erdos was a searcher, a searcher for mathematical truth. 

Paul’s place in the mathematical pantheon will be a matter of strong debate for in 
that rarefied atmosphere he had a unique style. The late Ernst Straus said it best, in a 
commemoration of Erdés’ seventieth birthday. 


In our century, in which mathematics is so strongly dominated by “theory 
constructors” he has remained the prince of problem solvers and the absolute 
monarch of problem posers. One of my friends — a great mathematician in his 
own right — complained to me that “Erdés only gives us corollaries of the great 
metatheorems which remain unformulated in the back of his mind.” I think there 
is much truth to that observation but I don’t agree that it would have been either 
feasible or desirable for Erdés to stop producing corollaries and concentrate on 
the formulation of his metatheorems. In many ways Paul Erdés is the Euler of 
our times. Just as the “special” problems that Euler solved pointed the way to 
analytic and algebraic number theory, topology, combinatorics, function spaces, 
etc.; so the methods and results of Erdés’ work already let us see the outline 
of great new disciplines, such as combinatorial and probabilistic number theory, 
combinatorial geometry, probabilistic and transfinite combinatorics and graph 
theory, as well as many more yet to arise from his ideas. 


Straus, who worked as an assistant to Albert Einstein, noted that Einstein chose 
physics over mathematics because he feared that one would waste one’s powers in 
pursuing the many beautiful and attractive questions of mathematics without finding 
the central questions. Straus goes on, 


Erdés has consistently and successfully violated every one of Einstein’s pre- 
scriptions. He has succumbed to the seduction of every beautiful problem he 
has encountered — and a great many have succumbed to him. This just proves 
to me that in the search for truth there is room for Don Juans like Erdds and Sir 
Galahads like Einstein. 


‘Reprinted with permission from the Bulletin of the American Mathematical Society. 
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I believe, and I’m certainly most prejudiced on this score, that Paul’s legacy will 
be strongest in Discrete Math. Paul’s interest in this area dates back to a marvellous 
paper with George Szekeres in 1935 but it was after World War II that it really 
flourished. The rise of the Discrete over the past half century has, I feel, two main 
causes. The first was The Computer, how wonderful that this physical object has led 
to such intriguing mathematical questions. The second, with due respect to the many 
others, was the constant attention of Paul Erdés with his famous admonition “Prove 
and Conjecture!” Ramsey Theory, Extremal Graph Theory, Random Graphs, how 
many turrets in our mathematical castle were built one brick at a time with Paul’s 
theorems and, equally important, his frequent and always penetrating conjectures. 

My own research specialty, The Probabilistic Method, could surely be called The 
Erdés Method. It was begun in 1947 with a three page paper in the Bulletin of 
the American Math Society. Paul proved the existence of a graph having certain 
Ramsey property without actually constructing it. In modern language he showed 
that an appropriately defined random graph would have the property with positive 
probability and hence there must exist a graph with the property. For the next twenty 
years Paul was a “voice in the wilderness,” his colleagues admired his amazing results 
but adaption of the methodology was slow. But Paul persevered — he was always 
driven by his personal sense of mathematical aesthetics in which he had supreme 
confidence — and today the method is widely used in both Discrete Math and in 
Theoretical Computer Science. 

There is no dispute over Paul’s contribution to the spirit of mathematics. Paul 
Erd6s was the most inspirational man I have ever met. I began working with Paul 
in the late 1960s, a tumultuous time when “do your own thing” was the admonition 
that resonated so powerfully. But while others spoke of it, this was Paul’s modus 
operandi. He had no job; he worked constantly. He had no home; the world was 
his home. Possessions were a nuisance, money a bore. He lived on a web of trust, 
travelling ceaselessly from Center to Center, spreading his mathematical pollen. 

What drew so many of us into his circle? What explains the joy we have in speaking 
of this gentle man? Why do we love to tell Erdés stories? I’ve thought a great deal 
about this and I think it comes down to a matter of belief, or faith. We mathematicians 
know the beauties of our subject and we hold a belief in its transcendent quality. God 
created the integers, the rest is the work of Man. Mathematical truth is immutable, 
it lies outside physical reality. When we show, for example, that two nth powers 
never add to an nth power for n > 3 we have discovered a Truth. This is our 
belief, this is our core motivating force. Yet our attempts to describe this belief 
to our nonmathematical friends are akin to describing the Almighty to an atheist. 
Paul embodied this belief in mathematical truth. His enormous talents and energies 
were given entirely to the Temple of Mathematics. He harbored no doubts about the 
importance, the absoluteness, of his quest. To see his faith was to be given faith. The 
religious world might better have understood Paul’s special personal qualities. We 
knew him as Uncle Paul. 

1 do hope that one cornerstone of Paul’s, if you will, theology will long survive. 
I refer to The Book. The Book consists of all the theorems of mathematics. For 
each theorem there is in The Book just one proof. It is the most aesthetic proof, the 
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most insightful proof, what Paul called The Book Proof. And when one of Paul’s 
myriad conjectures was resolved in an “ugly” way Paul would be very happy in 
congratulating the prover but would add, “Now, let’s look for The Book Proof.” This 
platonic ideal spoke strongly to those of us in his circle. The mathematics was there, 
we had only to discover it. 

The intensity and the selflessness of the search for truth were described by the 
writer Jorge Luis Borges in his story “The Library of Babel.’ The narrator is a 
worker in this library which contains on its infinite shelves all wisdom. He wanders 
its infinite corridors in search of what Paul Erdés might have called The Book. He 
cries out, 


To me, it does not seem unlikely that on some shelf of the universe there lies a 
total book. I pray the unknown gods that some man — even if only one man, and 
though it have been thousands of years ago! — may have examined and read it. 
If honor and wisdom and happiness are not for me, let them be for others. May 
heaven exist though my place be in hell. Let me be outraged and annihilated but 
may Thy enormous Library be justified, for one instant, in one being. 


In the summer of 1985 I drove Paul to what many of us fondly remember as Yellow 
Pig Camp — a mathematics camp for talented high school students at Hampshire 
College. It was a beautiful day — the students loved Uncle Paul and Paul enjoyed 
nothing more than the company of eager young minds. In my introduction to his 
lecture I discussed The Book but I made the mistake of describing it as being “held 
by God.” Paul began his lecture with a gentle correction that I shall never forget. 
“You don’t have to believe in God,” he said, “but you should believe in The Book.” 
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